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CHAPTER  1 


INTRODUCTION 

l.l  Motivation 

Itany  systems  we  encounter  In  our  dally  routines  have  these  dominant 
features:  1)  unknown  or  partial  knowledge  of  system  dynamics;  11)  presence 
of  multiple  decision-makers  or  controllers  each  of  whom  has  his  different 
objective;  111)  presence  of  unmeasurable  disturbances.  Examples  of  such 
systems  Include  distributed  Industrial  systesis,  power  and  energy  systems, 
transportation  systems,  environmental  systests,  biological  systems  and 
socio-economic  systems,  Just  to  name  a  few.  Optimization  of  such  systems 
falls  naturally  Into  the  framework  of  stochastic  adaptive  games. 

A  dynamic  game  la  a  system  characterized  by  the  presence  of  multiple 
decision-makers.  The  theory  of  games  first  attained  Its  formalism  due 
to  the  publication  of  the  book  "Theory  of  Games  and  Economic  Behavior" 
by  [43].  A  majority  of  the  work  for  game  theory  has  been  done  for  systems 
with  known  parameters  [20,  21].  In  this  thesis,  we  propose  an  adaptive 
procedure  to  tackle  the  game  problem  'lAen  we  have  no  Information  or  just 
partial  knowledge  of  the  system  parameters.  This  particular  adaptive 
algorithm,  which  Incorporates  a  minimum  variance  control  strategy  and  a 
least  squares  Identification  scheme.  Is  the  Self -Tuning  Strategy  [l,  5, 

45].  The  reason  for  using  the  Self-Tuning  Strategy  In  tackling  the 
Stochastic  Adaptive  Game  problem  Is  primarily  due  to  the  simplicity  of 
the  algorithm  and  proven  success  In  Industrial  applications  [2,  6]. 


The  Self-Tuning  Strategy  is  basically  a  suboptloal  control  scheme 
because  the  design  of  the  control  signal  does  not  take  Into  consideration 
the  effect  of  the  control  signal  on  the  estimation  of  the  system  dynamics 
In  the  design  of  stochastic  adaptive  controllers,  the  role  of  the  control 
signal  Is  two-fold:  1)  the  attainment  of  the  control  objective;  11)  the 
Identification  of  the  system  parameters  or  dynamics  [4,  8,  9,  56].  This 
dual  nature  of  the  control  signal  was  first  pointed  out  In  [22]. 
Controllers  which  take  Into  account  of  the  dual  nature  of  the  control 
system  are  classified  as  dual  controllers.  By  this  definition,  the  Self- 
Tuning  Strategy  Is  a  non-dual  type  algorithm  because  It  approaches  the 
estimation  problem  and  control  problem  Independently  and  assumes  no 
Interaction  exists  between  the  two  problems.  Even  though  the  Self-Tuning 
Strategy  Is  a  non-dual  adaptive  control  method,  It  has  received  wide 
attention  and  generated  a  substantial  amount  of  results  on  both  the 
theoretical  level  and  practical  applications  primarily  due  to  Its 
simplicity  and  ease  of  Implementation.  The  advent  of  microprocessor, 
with  Its  falling  cost  and  rising  computing  power,  has  allowed  a  prototype 
portable  self-tuner  to  be  constructed  and  tested  on  site  for  various 
Industrial  processes  [l9].  These  self-tuners  are  particularly  appealing 
under  the  following  situations: 

1)  frequent  manual  retunlng  needed  for  the  traditional  three 

term  FID  (Proportional,  Integral,  Derivative)  control  scheme; 

11)  frequent  changes  In  set  point  for  linearized  system  dynamics; 
ill)  presence  of  noise  In  the  system; 

Iv)  presence  of  slowly  ’’Ime-varylng  system  parameters. 


We  hope  che  proven  applications  of  the  self-tuner  will  provide  us 
with  a  practical  tool  for  solving  the  stochastic  adaptive  game  problem 


and  ultimately  enable  us  to  Implement,  with  ease,  the  theory  of  games  to 
the  numerous  systems  we  encounter  dally. 

1 . 3  Thes Is  Outline 

In  this  thesis,  we  will  utilize  the  self-tuning  method  to  solve  the 
stochastic  adaptive  Nash  game  and  Stackelberg  game  problems.  Our 
objective  Is  to  seek  steady  state  game  solutions  that  can  be  practically 
Implemented  with  ease.  Indeed,  by  restricting  the  cost  functions  of 
each  decision-maker  to  a  certain  class,  we  obtain  solutions  for  the  game 
problem,  which  resembles  closely,  after  certain  transformation,  the 
solution  of  the  self-tuning  control  problem  with  only  one  decision-maker. 
This  close  resemblance  Implies  that  the  computation  for  the  game  solution 
can  be  carried  out  using  similar  methods  that  are  used  for  the  self-tuners 
Microprocessor  Implementation,  naturally.  Is  a  desirable  goal. 

Since  our  approach  Is  based  on  the  self-tuning  principle,  we  will 
briefly  review  the  various  aspects  of  this  theory  In  Chapter  2.  We  will 
concentrate  on  the  original  self-tuning  regulator  [6]  and  a  generalized 
self- tuning  method  proposed  In  [16,  17].  Extension  and  new  convergence 
result  for  the  method  In  [l6,  17]  are  also  presented  In  this  chapter. 

In  Chapters  3  and  4,  we  will  define  and  formulate  the  stochastic 
adaptive  Nash  game  and  Stackelberg  game  problems  respectively.  We  will 
assume  a  centralized  Information  pattern,  that  Is,  the  game  problems  will 
be  solved  with  the  assumption  that  every  decision-maker  has  the  same 


input'Output  data  about  the  syatem.  Convergence  for  the  game  problem 
will  be  shown  by  extending  the  convergence  results  of  the  self-tuning 
controller  with  one  decision-maker. 

In  Chapter  5,  decentralized  stochastic  adaptive  Nash  games  will  be 
considered.  Specifically,  we  will  consider  a  "one-step-delay  Information 
sharing  pattern".  By  restricting  the  cost  functions  for  the  decision¬ 
makers  to  single-stage,  an  adaptive  games  solution  Is  obtained  by 
extending  the  results  of  static  games  with  known  parameters.  We  also 
obtain  similar  adaptive  solution  by  a  straightforward  constraint  on  the 
form  of  each  decision-maker's  control  law.  Simulation  results  using 
these  procedures  are  presented. 


CHAPTER  2 


SELF-rUNlNG  PRINCIPLE 


2 . 1  Introduction 

In  order  to  control  processes  where  there  are  unknown  parameters  and 
unmeasurable  disturbances »  the  self-tuning  method  has  been  proposed  to 
overcome  these  problems.  In  this  chapter,  we  will  review  the  underlying 
idea  of  self-tuning  for  the  single  decision-maker  single  criterion  case. 
New  convergence  result  and  extension  are  also  presented. 

In  Section  2.2,  the  Self -Tuning  Regulator  (STR)  of  [5]  will  be 
reviewed.  The  STR  basically  combines  a  minimum  variance  control  law  and 
a  least  squares  estimator  to  deal  with  the  unknown  parameter  and  noisy 
system.  A  variation  of  the  STR,  the  Self-Tuning  Controller  (STC)  in 
Cl6,  17,  18,  25,  26],  will  be  reviewed  and  extended  in  Section  2.3. 
Properties  of  these  controllers  are  discussed. 

In  Section  2.4,  convergence  results  for  the  STR  will  be  presented 
and  we  will  show  how  the  convergence  results  for  the  STR  can  be  carried 
over  to  the  STC.  A  remark  that  is  worth  mentioning  at  this  point  is  that 
a  similar  procedure  will  be  used  in  obtaining  convergence  results  for  the 
gaoie  problems.  In  other  words,  we  will  show  how  the  convergence  results 
for  the  STR  can  be  carried  over  to  the  Nash  game  and  Stackelberg  problems 

Finally,  in  Section  2.5,  an  example  based  on  a  paper  making  machine 
in  [14]  is  simulated  using  the  two  different  self-tuners. 


The  process  to  be  regulated  is  formuated  in  an  input-output  model 
form.  The  objective  of  the  control  action  is  to  minimize  the  output 
variances  of  the  process.  To  review  the  STR  concept,  we  will  first 
present  the  minimum  variance  strategy  for  the  system  assuming  complete 
knowledge  of  the  system  parameters.  Then,  the  adaptive  mlnlfflum  variance 
control  law  to  deal  with  unknown  system  parameters  is  presented.  Further 
details  can  be  found  in  [6,  14]. 

2.2.1  Minimum  Variance  Strategy 

The  process  to  be  controlled  is  governed  by 

A(q’Sy(t)  -  B(q'bu(t-k-l)  +  C(q'^)e(t)  ,  k  >  0  (2.1) 

where  q~^  denotes  the  backward  shift  operator,  k  is  time  delay,  y  is  the 
output  vector,  u  is  the  input  vector,  and  Ce(t)}  is  a  sequence  of  in¬ 
dependent,  identically  distributed  random  vectors  with  zero  mean  and  finite 
covariance.  The  vectors  y,  u,  and  e  are  all  the  same  dimension  p.  The 
polynomial  matrices  A(z),  B(z),  and  C(z)  are  all  of  dimension  p  X  p 
given  by 


A(z) 

■I+A-Z+e.e+AZ  f 

1  n 

(2.2a) 

B(*) 

■  Bq  +  Bj^z  +  ...  +  ,  Bq  non-singular 

(2.2b) 

C(z) 

■  I  +  C,  z  +  ...  C  z°  , 

L  n 

(2.2c) 

with  det  B(z)  and  det  C(z)  all  have  their  zeros  strictly  outside  the  unit 


circle. 


The  objective  of  the  control  action  Is  to  minimize,  given  the  Input- 
output  data  up  to  time  t,  with  respect  to  u(t),  a  cost  function  J  given 
by 

J  -  E{y’^(t+k+l)Qy(t+k+l)  }  (2.3) 

where  E  denotes  the  expectation  operation  and  Q  Is  a  p  X  p  symmetric 
positive  semldeflnlte  matrix.  The  minimum  variance  strategy  minimizes  J 
over  all  admissible  controls  u(t),  specifically,  all  u(t)  which  consists 
of  functions  of  all  current  and  past  outputs  y(t) ,y(t-l) , . . .  and  past 
Inputs  u(t-l) ,u(t-2) , . . .  . 

It  can  be  shown  that  the  minimum  variance  strategy  Is  given  by 


G(q"Sy(t)  +  F(q"SB(q"Su(t)  -  0  (2.4) 

where  F(z)  and  6(z)  satisfy 

C(z)  -  A(z)F(z)  +  z*'‘^^G(z)  (2.5a) 

F(z)G(z)  -  G(z)F(z)  (2.5b) 

det  F(z)  ■  det  F(z),  F(0)  ■  I  (2.5c) 

and  F(z),  G(z)  are  polynomial  matrices  given  by 

F(z)  -  I  +  Fj^z  +  ...  +  Fj^z**  (2.6a) 

G(z)  -  Gq  +  Gi*  +  •••  +  ®n-l*”'^  •  (2.6b) 


Derivation  of  the  minimum  variance  strategy  can  be  found  In  [14]. 

The  closed  loop  system  with  this  strategy  being  applied  becomes 

C(q'Sy(t)  -  C(q*bF(q‘be(t) 


(2.7) 


F(z)C(z)  -  C(z)F(z)  . 


(2.8) 


0>0  M 

Since  det  F(z)  «  det  F(z),  det  C(z)  *  det  C(z) .  Thus,  Che  closed  loop 
system  is  stable  as  det  C(z)  Is  assumed  to  have  all  zeros  strictly  out¬ 
side  the  unit  circle. 

The  control  error  with  this  strategy  Is  asymptotically  given  by 

y(t)  -  F(q‘Se(t)  (2.9) 

which  Is  a  moving  average  of  order  k  of  the  noise  e(t) . 

2 .X.l  Regulator  for  System  with  Unknown  Parameters 
In  order  to  control  the  process  given  by  (2.1)  with  unknown  para¬ 
meters,  the  following  model  Is  used  for  representation  of  the  process 

y(t)  +  i3(q‘hy(t-k-l)  -  fl(q’^)u(t-k-l)  +  €(t)  (2.10) 

where 

C7(z)  -  +  ^7^z  +  ...  +  (2.11a) 

5(z)  -  +  5^z  4.  ...  +  (2.11b) 

and  e(t)  Is  the  error  to  be  minimized  In  the  least  squares  sense. 

The  minimum  variance  strategy  for  the  process  (2.10)  Is  given  by 

5(q’Su(t)  -i7(q’Sy(t)  .  (2.12) 

For  the  STR,  at  each  Instance  of  time.  It  performs  a  least  squares 

A  0k 

estimation  for  the  model  given  by  (2.10).  The  estimates  ^(z)  and  <S(z) 
for  ^(z)  and  BiXi  respectively  are  then  substituted  Into  (2.12)  to  obtain 


the  optimal  control  u(t) .  The  certainty  equivalence  principle  it  invoked 
during  the  control  calculation  procedure  as  we  have  assumed  the  optimal 
control  signal  can  be  obtained  even  with  the  estimates  substituting  the 
true  parameters.  That  is,  we  have  assumed 

i^q'bu(t)  -  ^(q‘Sy(t)  (2.13) 

will  yield  the  same  optimal  control  as  (2.4). 

In  the  adaptive  control  literature,  this  method  is  classified  as  an 
implicit  method  or  direct  method  since  the  parameters  of  the  system  are 
not  estimated  explicitly  (thus  implicit  method)  and  that  the  parameters 
for  the  regulator  are  estimated  directly  (thus  direct  method).  If  an 
explicit  estisiation  of  the  system  parameters  is  being  done,  that  is,  the 

A  A  A 

estimates  A(z),  B(z),  and  C(z)  are  obtained  for  the  process  (2.1),  poly¬ 
nomial  matrix  factorizations  and  computations  will  have  to  be  carried 
out  before  arriving  at  the  optimal  control  signal  u(t) .  The  direct 
method  here  allows  simpler  and  faster  computations  for  the  optimal  control 
To  estimate  the  parameters  of  the  regulator  recursively,  the 
following  least  squares  procedure  may  be  used  [3,  6].  Introduce  the  para¬ 
meter  matrix  ®  given  by 

s 

s 

^-1 


®  -  [01  02  63  ...  0p3 


(2.14) 


Th«  following  recursions  ere  carried  out  at  each  step  of  time  to  estimate 
@  for  i  *  1,2, ... ,p: 

9j(t)  -  9^(t-l)  +  K(t-l)Cyj(t)  -  T1  (t-k-l)e^(t-l)]  (2.15) 

K(t-l)  -  PCt-DTl^Ct-k-DCl  +  TKt-k-DPCt-DTl^Ct-k-l)]'^  (2.16) 

P(t)  -  P(t-l)  -  K(t-l)Cl  +  Tl(t-k-l)P(t-l)‘n^(t-k-l)k'^(t-l)  (2.17) 

with  being  the  i-th  component  of  y  and 

Tl(t-k-l)  -  [-y’^(t-k-l)  ...-y’^(t-k-l-n^u'^(t-k-l)  ...  u’^(t-k-l-n^ ] 

(2.18) 

where  n4k*l.  It  can  be  observed  that  if  the  initial 

values  of  P(t)  is  the  same  for  all  of  the  p  steps  of  the  estimation  at 
each  step  of  time,  the  corresponding  gain  matrix  K(t)  will  remain  constant 
at  each  step  of  time  for  all  of  the  parameter  vectors  8^  (1  *  l,2,...,p). 

In  this  manner,  considerable  computational  efforts  can  be  saved.  Another 
remark  is  that  other  types  of  least  squares  estimation  schemes  may  also 
be  used  to  deal  with  slowly- varying  parameters  or  to  enhance  numerical 
stability  of  the  estimation  computations. 

Results  for  convergence  of  this  adaptive  algorithm  have  been  reported 
[27,  28,  38,  39].  In  [38,  39],  convergence  of  optimal  control,  based  on 
convergence  of  estimates,  is  guaranteed  for  single  input  single  output 
system  if  the  system  input-output  remains  bounded  and  if  a  certain  positive 
real  condition  of  the  noise  dynamic  c(z)  is  satisfied.  In  [27,  28],  the 
boundedness  of  the  system  variable  is  removed.  Further  discussion  will 
be  presented  in  Section  2.4.  What  should  be  kept  in  mind  at  this  point 


Is  that  these  convergence  results  developed  for  the  STR  will  be  used  to 
determine  convergence,  first  of  the  self-tuning  controller  (STC)  which 
Is  presented  In  the  next  section,  and  eventually  the  stochastic  adaptive 
game  problems. 

2.3  Self-Tuning  Controller  (STC) 

The  single  Input  single  output  STC  In  [17]  Is  basically  the  same  as 
the  STR  except  for  a  penalty  on  the  control  signal  In  the  cost  function. 
The  presence  of  a  penalty  may  reduce  the  excessive  control  signal 
magnitude  that  Is  common  for  the  STR.  It  may  also  offer  a  simpler  method 
to  deal  with  nonmlnlmum  phase  processes  [25,  29].  We  will  generalize  the 
single  Input  single  output  (SISO)  case  to  multiple  Input  multiple  output 
(MIMO)  case  here.  See  also  [33]  for  another  approach. 

The  cost  function  for  the  STC  Is  given  by 

j  ■  E{y^(t-fk+l)Qy(t4k+l)  +  u^(t)Ru(t)  ]  (2.19) 

where  Q  Is  a  symmetric  positive  semldeflnlte  matrix  and  R  Is  a  symmetric 
positive  definite  matrix. 

To  facilitate  our  analysis  In  the  latter  part  of  the  report,  we  will 
consider  the  process  under  consideration  to  be  governed  by 

a(q'by(t)  -  B(q*hu(t-k-l)  +  C(q‘Se(t)  (2.20) 

where  k  Is  known  time  delay  and  a(z)  Is  a  scalar  polynomial  and  B(z), 

C(z)  are  polynomial  matrices  given  by 

a(z)  ■  1  +  a,z  +  ...  +  a_z“ 


(2.21a) 


*•■•••  *»•  ^ 


B(2)  -  Brt  +  B,z  +  ...  +  B  ,z“‘^  (2.21b) 

u  1  in~  1 

C(z)  -  I  +  Cj^z  +  ...  +  C^z^  .  (2.21c) 

and  C(z)  has  all  its  zeros  outside  the  unit  circle.  The  process  given  by 
(2.1)  can  readily  be  converted  to  (2.20)  as  shown  In  Appendix  A. 

2.3.1  Controller  Design  with  Known  Parameters 

We  will,  again,  first  consider  the  system  with  known  parameters  and 
derive  a  strategy  that  minimizes  (2.19)  and  then  the  adaptive  algorithm 
for  unknc^  parameters  will  be  presented  In  the  next  section. 

Theorem  2.1.  The  control  law  that  minimizes  (2.19)  for  the  system 
(2.20)  satisfies 

MG(q'Se(q'by(t)  +  CMP(q"bC(q'SB(q'S  +  c(q‘hH]u(t)  -  0  (2.22) 

where 


M-bJq 

(2.23a) 

H  -  R 

(2.23b) 

C<z)  ■  adjoint  C(z) 

(2.23c) 

c(z)  ■  det  C(z) 

(2.23d) 

G(z)  -  Gq  +  Gj^z  +  ...  + 

(2.23e) 

F(z)  -  I  +  Fj^z  +  . . .  Fj^z*' 

(2.23f) 

and  G(z),  F(z)  satisfies 

C(z)  •  a(z)F(z)  +  z''**’^G(z)  . 


(2.24) 


Ir4*1 

Proof,  Premultiply  (2.20)  by  z  F(z)fl<z)  to  obtain 

F(q"hc<q"Sa(q‘Sy(e-Ht+l)  -  F(q“bc<q"SB(q"Su(t) 

+  F(q’Sc<q'Se(t+k+l)  . 

Using  (2.24),  the  above  equation  becomes 

[C(q“S  -  q“^^‘^^^G(q‘S]C(q"Sy(t-ric+l) 

-  F(q‘Sfl(q"SB(q‘Su(t)  +  F(q‘Sc(q‘Sc(q"Se(t-Hc+l) 
or 

/ 

c(q  ^)[y(t+k+l)  -  F(q"^)e(t-f*+l)] 

-  G(q’hfl<q’by(t)  +  F(q"Sc5(q'SB(q"Su(t)  (2.25) 

where  the  fact  c(z)I  ■  C(z)C<z)  has  been  used.  Denote 

y*(t+k+l|t)  -  y(t+k+l)  -  F(q‘^)e(t+k+l)  ,  ‘  (2.26) 

that  Is,  y*  Is  the  least  squares  optlsial  predictor  of  y  given  the  data 
up  to  time  t,  which  Is  uncorrelated  with  F(q~^)e(t-fk+l) .  Condalnlng 
(2.25)  and  (2.26)  yields 

c(q"^)y*(t+k+l jt)  -  G(q*^)C<q"^)y(t) 

+  F(q‘Sc(q’SB(q‘Su(t)  .  (2.27) 


Substituting  (2.26)  Into  the  cost  function  (2.19)  yields 


J  -  E{[F(q  ‘)e(t-He+l)yQCF(q  ")e(t-Hc+l) ]} 


+  E[[y*  (t+k+1  lt)Qy*(t+k+l  |t)  +  u’^(t)Ru(t)  }  .  (2.28) 


The  first  term  of  (2.28)  is  related  to  future  noise  covariance  which 
cannot  be  optimized.  Hence,  the  optimization  is  concentrated  on  the 
second  term.  Assuming  the  existence  of  ,  the  necessary  condition 

for  a  mlniauim  yields 


aj _ 

au(t) 


BjQy*(t+k+l jt)  +  R^u(t) 


0  «  My*(t+k+l jt)  +  Hu(t)  . 


(2.29) 


Premultiply  (2.29)  by  c(q~^)  and  combining  the  resulting  equation  with 
(2.27),  we  have 

MG(q"^)C<q'^)y(t)  +  CMF(q"SC(q"SB(q"h  +  c(q'SH]u(t)  -  0 


as  stated  in  (2.22). 


Q.E.D. 


A  remark  that  is  noteworthy  is  that  in  the  SIR  where  there  is  an 
absence  of  penalty  on  control,  the  optimal  control  is  obtained  by  setting 
the  predictor  y*  to  zero.  This  setting  of  the  least  squares  optimal 
predictor  to  zero  to  compute  the  optimal  control  is  the  underlying  factor 
that  enables  the  parameters  for  the  STR  to  be  directly  estimated  in  the 
adaptive  situation.  It  will  be  most  convenient  if  a  similar  direct  method 
can  be  used  for  the  SIC.  To  accomplish  this  ultimate  goal,  we  continue 


our  analysis  on  the  optimal  control  equation  (2.22)  for  known  parameters 
to  see  If  a  suitable  optimal  least  squares  predictor  function  can  be 
obtained. 

Define  a  function  0  such  that 

0*(t+k+llt)  -  My*(t4k+1  It)  +  Hu(t)  (2.30) 

where  M  and  H  are  as  defined  In  Theorem  2.1.  From  (2.29),  the  optimal 
control  Is  obtained  by  setting  0*  to  zero;  thus,  0*  seems  to  be  a  possible 
candidate  for  the  predictor  function. 

Let  the  function  0  be  defined  by 

0(t4k+l)  -  My(t-He+1)  +  Hu(t)  .  (2.31) 

Equations  (2.20),  (2.27),  (2.30)  and  (2.31)  yields  the  following  system; 

a(q"b0(t)  -  (MB(q'S  +  Ha(q‘S)u(t-k-l) 

+  MC(q‘Se(t)  (2.32) 

c(q"S0*(t+k+l  It)  -  M5(q'bC(q"Sy(t) 

+  CMF(q’Sc(q"SB(q’S  +  c(q'SH]u(t)  (2.33) 

0(t+k+l)  -  0*(t+k+llt)  +  MP(q'^)e(t-Hc+l)  (2.34) 

since  y*(t+k+l|t)  Is  uncorrelated  with  F(q'^)e(t+k+l) ;  thus,  0*(t4k+l jt) 
and  MF(q  ^)e(t-Hc‘f'l)  are  also  uncorrelated,  which  Implies  0*  Is  the  least 
squares  optimal  predictor  for  0.  Furthermore  If  we  define  a  new  cost 
function  I  given  by 

I  -  E{0^(t-Hc+l)0(t+k+l)} 


(2.35) 


then  the  mlnlfflum  variance  strategy  for  the  system  governed  by  (2.32)  with 
cost  function  (2.35)  is  obtained  by  setting  the  optimal  predictor  0  to 
zero. 

To  summarize,  the  original  system  given  by 
a(q'^)y(t)  “  B(q"^)u(t-k-l)  +  C(q"^)e(t) 

c(q“Sy*(t+k+llt)  -  G(q'ScJ(q‘Sy(t)  +  F(q'Sfl(q“SB(q"bu(t) 
y(t-Hc+l)  -  y*(t-Hc+l  |t)  -  F(q‘^)e(t+k+l) 


has  been  transformed  to  an  equivalent  system 


a(q'S0(t)  -  (MB(q"S  +  Ha(q‘S)u(t-k-l)  +  MC(q“Se(t) 


c(q“S0*(t4k+llt)  -  MG(q‘bcJ(q‘by(t) 


+  [MF(q‘bO(q‘SB(q'S  +  c(q"SH]u(t) 


0(t+k+l)  -  0’^(t4k+ljt)  -  MF(q“^)e(t4k+l) 


with 


0(t)  -  My(t)  +  Hu(t-k-l)  . 


An  additional  advantage  to  be  gained  In  transforming  the  original 
system  Into  a  system  which  Is  similar  to  a  SIR  structure  Is  the 
possibility  of  applying  directly  the  convergence  results  for  the  SIR  to 
the  SIC.  In  the  latter  part  of  the  report,  our  convergence  analysis  for 
the  game  problem  will  also  be  based  on  this  approach  of  transforming  the 
original  game  problem  to  a  system  governed  by  (2.32-2.34). 


The  closed- loop  system  with  (2.22)  being  applied  becomes 

c(q"S(M  +  a(q‘hHB‘^q’S)y(t) 

-  c(q’^)(MF(q‘S  +  HB"^q'Sc(q’b)e(t)  .  (2.36) 

Since  c(z)  is  assumed  to  have  all  its  roots  outside  the  unit  circle,  the 
stability  of  the  system  is  thus  dependent  on  the  roots  of  the  system 

det(MB(z)  +  Ha(z))  -  0  .  (2.37) 

Hence  by  choosing  M  and/or  H  properly,  the  system  can  be  stabilized  even 
with  a  B(z)  that  does  not  have  all  its  zeros  outside  the  unit  disc. 

2.3.2  Controller  for  System  with  Unknown  Parameters 
In  order  to  control  the  process  given  by  (2.20)  with  unknown  para¬ 
meters,  the  following  model  is  used  for  representation  of  the  system 

0(t)  +  <7(q"by(t-k-l)  -  /®(q‘Su(t-k-l)  +  e(t)  (2.38) 

where  0  is  defined  in  (2.31)  and 

(7(z)  •  (7q+<7^z  +  ...  + 

Siz)  •  S^  +  8^z  +  ...  + 

and  e(c)  is  the  error  to  be  minimized  in  the  least  squares  sense. 

The  certainty  equivalent  minimum  variance  control  law  for  (2.38)  is 
given  by 


^(q’^)u(t)  -  (7(q"^)y(t) 


(2.39) 


where  i@(z)  and  Cl{z)  denote  the  least  squares  estioates  for  5(z)  and 
C[{z)  respectively. 

The  following  recursive  estimation  scheme  may  be  used  [27,  28]. 
Introduce  a  parameter  matrix  0  as  defined  In  (2.14).  Then  at  each  step 
of  time,  the  following  recursions  with  k  •  0  are  carried  out  to  estimate 
0  for  1  ■  1,2 . . 

0^(t)  -  0^(t-l)  +7^7^  \(t-l)[0^(t)  -  Tl^(t-l)0^(t-l)]  (2.40) 

r^(t-l)  -  +  'nJ(t-l)Tl^(t-l)  .  r^(0)  -  1  (2.41) 

where  Is  given  by 

Tli(t)  -  C-y'^(t)  ...-yV-tri-l)u^(t)  ...  u^(t-iiri-l)]  (2.42) 

and  a  >  0  Is  a  constant.  See  [27]  for  general  delay  k  0. 

Using  0  as  the  Input  to  the  controller.  It  Is  possible  to  avoid  some 
of  the  complex  matrix  calculations  and  determine  the  controller  parameters 
directly.  However,  this  may  present  some  problem  since  knowledge  of  the 
Bg  parameter  Is  required  In  computing  the  signal  0.  In  the  present  case 
for  a  single  decision-maker,  this  may  not  be  extremely  annoying  since  an 
arbitrary  choice  for  the  matrix  M  results  only  In  a  change  of  the  penalty 
on  the  output  variances.  Certainly,  one  method  to  overcome  the  problem 
Is  to  estimate  the  system  parameters  explicitly  and  go  through  all  the 
matrix  computations.  We  will,  however,  at  this  point  assume  that  the 
parameter  In  the  process  Is  known  as  this  does  not  appear  to  be  a  very 
stringent  requirement  In  practical  applications. 
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It  should  be  noted  that  similar  derivation  o£  the  self-tuning 
controller  strategy  is  carried  out  in  [33]  using  different  system 
representations.  However,  in  our  approach,  by  adhering  to  the  system 
representation  (2.20),  convergence  and  stability  results  in  [27]  can  be 
readily  established  for  the  STC  as  shown  in  the  next  section. 


Convergence  Analysis  for  Self-Tunint 


It  has  been  shown  in  [38,  39]  for  the  SISO  SIR  that  the  estimated 
parameters  of  the  regulator  will  converge  and  yield  the  optimal  control 
based  on  true  system  parameters  if  the  following  conditions  are  satisfied: 
i)  the  sequences  {y(t)],  [u(t)}  are  uniformly  hounded; 

ii)  the  polynomial  ( — — .  -  i)  is  strictly  positive  real;  and 

c(q'b 

ill)  there  is  no  factor  common  to  A(z),  B(z)  and  C(z)  in  (2.1). 
Similar  analysis  for  the  MIMO  case  has  been  reported  in  [14]. 

In  another  approach  using  Martingale  theory  in  [27],  convergence  of 
parameters  is  not  explicitly  required  and  the  boundedness  of  the  system 
input-output  is  removed.  Their  result  is  stated  in  the  following  theorem. 

Theorem  2.2  [27,  Theorem  S.l].  Consider  the  cost  function  (2.3) 
and  the  system  (2.20)  which  satisfies  the  following  assumptions: 
i)  the  number  of  inputs  p  equals  the  number  of  outputs; 
il)  the  delay  k  *  0; 

ill)  upper  bounds  for  the  orders  of  the  scalar  polynomials  appearing 
in  Ca(q*^),  B(q'^),  CCq”^)}  are  known; 


a  V 


iv)  det  B(z)  p  0  I*  1  ^ 

det  C(z)  j*  0  1*  I  5  ^  5 

v)  (c(z)  -  •j)  is  strictly  positive  real, 
then  with  probability  one, 

1  N  2 

(1)  supi  Zlly(t)lr<« 

N  ”  1 

,  N  , 

(2)  sup  i  El!u(t)lr<« 

N  ^  1 

I  N  2 

(3)  lim^  E  ECy^(t)}  -  ,  i  -  l,2,...,p 

where  is  the  minimum  mean  square  error  for  any  causal  linear  feedback 
(including  the  one  designed  using  true  system  parameters),  if  the  STR 
algorithm  in  Section  (2.2.2)  is  applied  to  the  system. 

Notice  the  presence  of  a,  which  is  present  in  (2.40),  allows  a 
certain  degree  of  freedom  to  ensure  condition  v)  is  satisfied. 

The  result  of  Theorem  2.2  will  be  applied  to  the  equivalent  trans¬ 
formed  system  when  the  STC  algorithm  is  used.  Hence,  the  process  will  be 
governed  by  (2.32)  with  the  cost  function  (2.35).  By  modifying  certain 
assumptions  on  the  system,  we  see  that  the  input-output  will  remain 
bounded  and  the  control  error  of  the  transformed  system  will  achieve  its 
global  minimum.  Hence,  we  have  the  following  theorem. 


Theorem  2.3.  Consider  the  cost  function  (2.19)  and  the  transformed 
system  (2.32)  which  satisfies  assumptions  i) ,  il) ,  v)  of  Theorem  2.2  and 
the  following  conditions: 

ill)  upper  bounds  for  the  order  of  the  scalar  pol3momial8  appearing 
in  {a(q  ^) ,  MB(q*^)  +  Ha(q’^),  C(q"^) }  are  known; 


1-1  <  1 
!z  1  f  1  ; 


tv)  det(MB(z)  +  Ha(z))  i  0  , 
det  C(z)  ^  0  , 
vi)  Bq  Is  known. 

If  the  STC  algorithm  In  Section  2.3.2  Is  applied  to  the  system,  then 
the  Input -output  will  remain  bounded  and  the  output  error  of  0(t)  will 
achieve  a  minimum  achievable  by  any  causal  linear  feedback. 

Theorem  2.3  will  be  used  in  showing  convergence  of  the  game  problems 
that  Is  considered  In  the  latter  part  of  the  report. 

2.5  Simulation  Ezample 

To  Illustrate  some  of  the  features  of  the  STR  and  STC,  an  example 
based  on  a  paper  making  machine  Is  simulated  and  evaluated.  See  [l4] 
for  details  of  the  model. 

The  plant  Is  governed  by 


y(t)  +  Aj^y(t-l)  -  BQu(t-l)  +  e(t) 


where 


Ete(t)e\t)}  - 


-0.99101 

8.80512  X 

-0.80610 

-0.77089 

0.89889 

-4.59328  X 

19.390 

0.88052 

0.02 

0.35 

0.35 

7.6 

v-3 


The  cost  function  J  is  given  by 


J  -  ECy^(t+l)Qy(t+l)  +  u^(t)Ru(t)  } 


All  Initial  parameters  except  are  set  to  zero,  Is  set  to  the 
Identity  matrix  to  prevent  control  saturation. 


Self-Tuning  Regulator;  The  STR  algorithm  estimates  recursively  the 
parameters  of  the  model 


y(t)  +  -  /^u(t-l)  +  e(t)  , 

and  the  control  Is  given  by 

u(t)  -  i^^3rQy(t)  . 

The  Input-output  of  a  typical  run  will  be  presented  and  discussed  later. 

Self -Tuning  Controller;  The  STC  algorithm  estimates  recursively  the 
parameters  of  the  model 

0(t)  +  5jjy(t-l)  -  i^u(t-l)  +  e(t)  , 

and  the  control  Is  given  by 


u(t)  -  • 


The  Input  to  the  controller  0(t)  Is  given  by 

0(t)  "  y(t)  +  M"^u(t-1) 

XT  -1 

with  M  "  BqQ,  H  -  R  .  Nbtlce  since  M  exists  in  this  case,  our  operation 
in  premultiplying  (2.31)  by  is  Justified.  Such  transformation  of  0 
will  not  affect  the  stabilizing  property  of  the  algorithm. 

Comparisons  of  STR  and  STC;  In  order  to  compare  the  different 
features  of  the  two  self- tuners,  we  have  carried  out  the  simulation  in 
two  parts.  We  will  first  assume  is  known,  thus  3^  and  M  are  known. 

The  input -output  of  a  typical  run  is  shown  in  Figures  2. la-2. 4a  for  the 
STR  and  the  corresponding  trajectories  for  the  STC  are  shown  in 
Figures  2. lb-2. 4b.  In  the  second  part,  Bq  is  unknown  and  M  is 
arbitrarily  chosen  as  the  identity  matrix  for  the  STC.  The  input-output 
of  a  typical  run  (with  other  conditions  the  same  as  the  first  part)  is 
shown  in  Figures  2. 5a -2. 8a  for  the  STR  and  the  corresponding  trajectories 
for  the  STC  is  shown  in  Figures  2.5b-2.8b. 

The  simulation  results  indicate  that  both  self-tuners  indeed  perform 
satisfactorily  regardless  of  the  knowledge  of  Bq.  However,  from  Figure  2.3 
and  Figure  2.4,  we  notice  that  u^  and  U2  are  substantially  reduced 
especially  during  start  up  if  the  STC  is  used.  The  prevention  of 
excessive  control  action  is  attained  in  this  instance.  From  Figures  2.5- 
2.8,  it  can  be  observed  again  the  reduction  of  excessive  control  and  thus 
excessive  output  variance  is  achieved  even  with  unknown  Bq.  The  STC  does 
seem  to  have  an  edge  in  terms  of  smoother  control  action.  However,  we 
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CHAPTER  3 

STOCHASTIC  ADAPTIVE  NASH  GAMES 

3.1  Introduction 


In  this  chapter  we  will  consider  the  stochastic  adaptive  Nash  game 
problem  using  the  self-tuning  controller  (STC)  approach.  Nash  games  were 
first  Introduced  and  Investigated  In  a  static  framework  In  [42].  They 
were  later  extended  to  the  dynamic  case  [53,  54].  The  decision-makers  In 
a  Nash  game  simultaneously  minimize  their  respective  cost  functions  with 
respect  to  their  Individual  controls.  The  resulting  optimal  strategy  Is 
called  the  Nash  equilibrium  strategy.  This  strategy  has  the  property 
that  If  one  decision-maker  deviates  from  It,  he  cannot  Improve  his  per¬ 
formance.  However,  It  may  be  possible  for  some  or  all  of  the  decision¬ 
makers  to  Improve  their  performance  when  more  than  one  decision-maker 
deviates  from  the  equilibrium  strategy.  That  Is,  the  Nash  equilibrium 
strategy  Is  secure  against  unilateral  deviation  but  not  necessarily 
collusion.  The  Nash  game  framework,  thus,  Is  very  appealing  to  large 
scale  systems  or  distributed  Industrial  systems  where  there  are  a  host  of 
noncooperative  decision-makers  or  controllers  each  trying  to  minimize  his 
own  cost  functional. 

Definition  3.1.  A  strategy  set  [uj^jU^, . . .  ,Ujj}  Is  a  Nash  equilibrium 
strategy  set  If 

J ,  .  .  .  ,  U^_  ,  Uj^  ,  ,  .  .  .  ,  Ujj) 

-  •'l^“l . "l-l’^l’^l+l . V  ’  i  - 


(3.1) 


for  all  admissible  controls  of  decision-maker  1;  and  Is  the  cost 
function  for  decision-maker  1  for  which  that  decision-maker  Is  trying  to 
minimize. 

As  In  stochastic  optimal  control  problems,  there  are  different 
solution  concepts,  namely,  open-loop  and  closed- loop  solutions,  to  the 
Nash  game  problem  [53].  In  general,  the  open- loop  and  closed- loop 
solution  of  a  game  problem  Is  different.  However,  by  restricting  the  cost 
function  of  the  decision-maker  to  single-stage,  the  distinction  between 
the  different  types  of  solution  cease  to  exist  as  we  have  essentially 
reduced  the  problem  to  a  static  framework.  Moreover,  the  restriction  also 
enables  us  to  seek  steady  state  solutions  to  the  game  problem  using  the 
self -tuning  approach. 

In  Section  3.2,  formulation  of  the  Nash  game  problem  Is  presented. 

The  solution  to  the  stochastic  adaptive  Nash  game  problem  will  be  dis¬ 
cussed  In  Section  3.3.  It  turns  out  that  the  game  solution  closely 
resembles,  after  a  Judicious  transformation,  a  minimum  variance  control 
problem  with  one  decision-maker.  It  Is  this  resemblance  that  enables  the 
established  results  for  the  SIR  to  be  applied  to  the  game  problem. 

Finally,  In  Section  3.4,  a  simulation  study  on  an  economic  system  is 
presented  to  Illustrate  the  proposed  adaptive  solution. 

3.2  Problem  Formulation 

Previously,  dynamic  games  have  mostly  been  analyzed  for  system  in 
state-space  representation.  We  will  formulate  the  game  problem  in  the 
Input-output  form  so  that  the  self-tuning  algorithm  can  readily  be  applied. 
Consider  a  system  given  by  the  equations 


V  5> 


■ 


x(t+l)  -  Fx(t)  +  G^u^(t)  +  G2U2(t)  +  ...  +  Gj,Ujj(t) 


+  Ke(t) 


(3.2) 


y(t)  -  Tx(t)  +  e(t) 


(3.3) 


where  each  which  is  of  dimension  represents  a  controller  that 
tries  to  minimize  a  cost  function  given  by 

-  ECCy(t-rtt+l)  -  y’^(t-ric+l)]^Q^[y(t+k+l)  -  y’^(t-HR+l)] 


+  [u(t)  -  u(t-l)]'*^R^[u(t)  -  u(t-l)]}  , 


(3.^) 


1  -  1,2 . N 

where  Is  a  symmetric  positive  semldeflnlte  matrix  and  y^  Is  the 
desired  value  of  y.  The  matrix  is  a  symmetric  matrix  with  Its  l,l-th 
block,  denoted  by  being  positive  definite.  The  vector  u  Is 

formed  by  stacking  all  the  (1  *  1,2,...,N)  In  a  column.  The  reason 

for  penalizing  the  term  [u(t)  -  u(t-l)]  is  to  avoid  finding  the  reference 
control  signal  u^  that  corresponds  to  a  nonzero  y^.  The  state  vector 
x(t)  is  n-dlmenslonal.  The  Input  u(t) ,  output  y(t)  and  the  noise  sequence 

f  1  ** 

le(t)}  are  all  of  dimension  p  (that  Is,  p  »  m^^).  Furthermore, 

{e(t) }  Is  assumed  to  be  an  Independent  equally  distributed  zero  mean 
random  vector  with  finite  covariance.  Let  G  ■  [Gj^iG2 ; . . .  IGj^],  then  (3.2) 

I 

becomes 


x(t+l)  -  Fx(t)  +  Gu(t)  +  Ke(t)  . 


(3.5) 


It  can  be  shown  that  (3.5)  and  (3.3)  can  be  transformed  to  an  Input- 
output  representation  [27],  ^Ich  is  given  by 


a(q‘Sy(t)  -  B(q"bu(t-k-l)  +  C(q'Se(t)  ,  k  >  0 


(3.6) 


where  k  is  a  known  time  delay  and  a(z)  is  the  scalar  characteristic  poly 
nomlal  for  the  system  (3.5)  and  a(z),  B(z),  C(z)  are  in  the  form  as  given 
in  (2.21). 

To  allow  more  flexibility,  we  will  consider  the  system  to  be 
governed  by 

a(q’hy(t)  -  B(q‘Su(t-k-l)  +  C(q‘Se(t)  +  D  (3.7) 

^ere  D  is  a  p-dlmenslonal  offset  vector. 

3.3  Self«Tuning  Nash  Game 

The  self-tuning  approach  is  adopted  to  seek  steady  state  solutions 
for  the  stochastic  adaptive  Nash  game  problem.  As  in  the  usual  analysis 
for  such  an  approach,  the  control  strategy  is  first  derived  assuming  all 
the  parameters  are  known,  then  an  adaptive  procedure  is  incorporated  to 
deal  with  unknown  parameters. 

3.3.1  N -Person  Nash  Equilibrium  Strategy 

The  derivation  of  the  Nash  equilibrium  strategy  is  very  similar  to 
that  of  the  STC.  We  will  summarize  the  result  in  a  theorem. 

Theorem  3.1.  Let  represent  the  i-th  column  block  (of  dimension 

p  X  m^)  of  the  p  X  p  matrix  L.  The  Nash  Equilibrium  Strategy  u*(t)  for 
the  system  (3.7)  with  cost  functions  (3.4)  is  given  by 


MG(q  ^)C(q  ^)y(t)  +  CMF(q  ^)C(q  ^)B(q 


+  (l-q"^)c(q‘^)H]u*(t)  +  MP(q’bfl<q"^)D 
-  c(q"^)My'(t-Hc+l)  -  0 

with 


fl<z)  ■  adjoint  C(z) 
c(z)  "  det  C(z) 


G(z) 

F(z) 


M  - 


H  - 


°0  +  =1*  +  • 


•  •  *  °n-l' 


n-1 


I  +  Fj^z  +  .. 

r  T 
®0 


®0 


.  +F^z 


Vcir 

“i 


and  6(z),  F(z)  satisfy  the  following  identity 
C(z)  -  a(z)F(z)  +  z*''''^G(z)  . 


Proof.  See  Appendix  B. 

Notice  that  (3.8)  is  Just  a  system  of  linear  equations, 
extremely  convenient  in  the  computational  aspects  over  other 


(3.8) 

(3.9a) 

(3.9b) 

(3.9c) 

(3.9d) 

(3.9e) 

(3.9f) 


(3.9g) 

which  is 
game  solutions 


that  usually  Involves  Rlccatl  equations.  Another  observation  of  (3.8)  Is 
that  it  closely  reseobles  the  optimal  control  law  for  the  STC  given  in 
Theorem  2.1  except  for  the  definition  of  the  matrices  M  and  H.  Hence, 
the  original  system  (3.7)  with  y(t)  as  output  can  be  transformed  into  an 
equivalent  system  with  0(t)  as  output  as  in  the  case  for  the  STC. 

Let  the  function  0(t)  be  defined  by 

0(t)  -M(y(t)  -  y'(t))  +H(u(t-k-l)  >  u(t-k-2))  .  (3.10) 

The  equivalent  transformed  system  is  then  given  by 

a(q‘S0(t)  -  (MB(q‘S  +  (l-q"Sa(q‘SH)u(t-k-l) 

+  MC(q‘S«(t)  +  MD  -  a(q"^)My*‘(t)  .  (3.11) 

Furthermore,  as  in  the  STC,  by  defining  a  new  cost  function  I  given  by 

I  -  E{0^(t4k+l)0(t+k+l) }  ,  (3.12) 

it  is  possible  to  obtain  the  Nash  strategy  (3.8)  by  considering  the  trans 
formed  system  (3.11)  with  the  cost  function  (3.12)  as  a  control  problem 
with  only  one  decision>maker.  That  is,  instead  of  optimising  with 
respect  to  u^(t)  for  the  i-th  decision-maker,  every  decision-maker  can 
determine  the  Nash  strategy  (3.8)  by  optimizing  I  with  respect  to  u(t). 

Another  interesting  property  of  the  solution  (3.8)  is  that  if  the 
penalty  on  control  in  the  Jj^'s  is  zero,  and  assuming  the  matrix  M  is  non¬ 
singular,  the  resulting  Nash  strategy  is  equivalent  to  the  MIMO  minimum 
variance  strategy  developed  for  the  STR  in  which  there  is  only  one 
decision-maker.  Moreover,  premultiplying  (3.8)  by  when  H  is  zero. 


we  see  that  this  Nhsh  strategy  is  independent  of  the  weighting  matrices 
(1  ■  Essentially,  the  game  flavor  of  the  problem  will  not 

arise  if  every  decision-maker  does  not  penalize  the  control  effort.  On 
the  other  hand,  even  when  there  are  penalties  on  controls,  and  if  all  the 
Q^'s  and  are  Identical  for  i  •  1,2,...,N,  the  Nash  strategy 

collapses  to  the  optimal  strategy  of  the  STC  in  Theorem  2.1.  Situations 
in  which  every  controller  has  the  same  cost  functional  are  analyzed  in 
the  realm  of  team  theory  [48]. 

3.3.2  N-Person  Self -Tuning  Nash  Equilibrium  Strategy 
Basically,  the  same  approach  utilized  in  the  MIMO  STC  will  be  used 
to  deal  with  unknown  parameters.  However,  further  restrictions  have  to 
be  placed  on  each  decision-maker.  From  this  point  on,  we  assuaie  every 
controller  agrees  to  use  the  same  estimation  scheme  and  identical  initial 
conditions.  These  restrictions  ensure  every  controller  has  the  same 
model  for  the  system  and  rid  us  of  the  complications  of  multimode ling. 
With  these  restrictions,  each  decision-maker  is  essentially  a  complete 
STC  by  himself.  That  is,  there  are  N  identical  STC  doing  identical 
computations  to  compute  the  Nash  equilibrium  strategy.  In  other  words, 
in  order  to  arrive  at  the  Nash  strategy  for  the  system  (3.7),  every 
decision-maker  uses  the  following  model  for  representation  of  the  process 

0(t)  -  c7(q'Sy(t-k-l)  +  5(q"bu(t-k-l)  +  3 

+  3(q"Sy’^(t)  +  €(t)  (3.13) 

with  0(t)  defined  in  (3.10)  and 

^(*)  -  iSL  +  Az  +  ...  ■¥  a  ,z“"^  (3.14a) 


3(*)  ■  + 


.  -  mfk-l 

(3.14b) 

*^5  — 

.  3  z  ,  n^  ■  degree  c(z) 

(3.14c) 

(3.14d) 


p  p  X  1 


and  e(t)  Is  Che  error  Co  be  minimized  In  Che  lease  squares  sense. 

The  certainty  equivalent  Nash  equilibrium  strategy  for  the  system 
(3.13)  Is  given  by 

0  -  ^q‘bu(t)  +  ^(q"Sy(t)  +  3(q‘Sy’^(t+k+l)  +  >  (3.15) 

A 

where  U  denotes  the  estimates  of  W.  The  parameters  can  be  estimated 
using  (2.40)  and  (2.41)  with 

T 

Tli(t)  -  Cy'*^(t)y^(t-l)...u^(t)u’^(t-l)...y’^  (t+k+l)...l]  (3.16) 
and  the  parameter  matrix  @  defined  by 


A 


The  following  recursions  (for  k  *  0)  are  then  carried  out  at  each  step 
of  time: 


0^(t)  -  e^(t-i)  +77^  \(t-i)[0^(t)  -  T^(t-i)0j^(t-i)] 

(3.18) 

r^(t-l)  -  rj^(t-2)  +  TlJ(t-l)'rij^(t-l)  ,  rj^(O)  -  1  .  (3.19) 

3.3.3  Convergence 

ic 

Convergence  of  the  estimated  prediction  0  to  the  true  prediction  0 
can  be  analyzed  using  Theorem  2.3.  By  defining  the  matrix  M  and  H 
according  to  (3.9e)  and  (3.9f)  respectively,  we  can  apply  Theorem  2.3  to 
show  convergence  of  the  game  problem,  which  Is  stated  In  the  following 
theorem. 

Theorem  3.2.  Consider  the  cost  functions  (1  ■  1,2,...,N)  in 
(3.4)  for  the  system  (3.13)  which  Is  assumed  to  satisfy  condition  1), 

11)  and  v)  of  Theorem  2.3  and  the  following  conditions: 

111)  upper  bounds  for  the  order  of  all  the  scalar  polynomials 
appearing  In  U(q"b,  (MB(q"^)  +  (l-q"^)a(q"^)H) ,  C(q"^)} 
are  known; 

Iv)  det(MB(z)  +  (l-z)a(z)H)  0  ,  |z  j  <  1 

det  C(z)  1*  0  ,  I*  1  5  ^ 

with  M  and  H  as  defined  In  (3.9). 

If  the  Self -Tuning  Nash  algorithm  (3. 15) -(3. 19)  is  applied  to  the 
system,  then  the  system  Input-output  will  remain  bounded  and  the  prediction 
error  for  0(t)  will  tend  to  its  global  minimum  with  probability  one. 
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3.4  Simulation  Example 

During  recent  years,  optimal  control  theory  has  been  widely  used  In 
the  field  of  economic  analysis  [7,  10,  31,  46,  55].  Optimal  control  theory 
seems  to  provide  an  extremely  versatile  tool  for  the  economist  to  deter¬ 
mine  tradeoffs  between  policies,  economic  stabilization  policies,  long 
term  Investment  policies  and  other  functions  alike.  Hence,  the  Nash 
strategy  proposed  here  Is  applied  to  a  rather  simple  minded  quarterly 
economic  model  with  two  Inputs  and  two  outputs  to  Illustrate  the  ability 
of  the  algorithm  to  stabilize  the  system.  The  two  outputs  are  the  con¬ 
sumption  expenditure  C(t)  and  private  Investment  I(t).  The  two  Inputs 
are  government  expenditure  6(t)  and  money  supply  M(t).  All  variables  are 
measured  In  constant  1958  dollars.  Further  details  can  be  found  In  [l5]. 

In  the  United  States,  the  formulation  of  the  oionetary  policy  Is  In  the 
domain  of  the  Federal  Reserve  System  (FRS)  while  the  formulation  of  the 
fiscal  policy  Is  primarily  In  the  hands  of  the  Congress  and  the  President 
[47].  There  have  been  many  Instances  during  which  the  two  "controllers" 
hold  different  objectives.  This  certainly  falls  naturally  Into  a  game 
framework.  We  will  assume  that  the  two  controllers  (FRS  and  the  federal 
government)  want  to  stabilize  this  system  along  certain  target  paths  or 
growth  patterns.  However,  they  have  different  views  on  where  the  emphasis 
should  be  placed,  which  Is  manifested  by  having  different  cost  functionals. 
Let  and  J2  be  the  cost  functions  of  the  federal  government  (Congress 
and  the  President)  and  FRS  respectively.  The  J^'s  are  given  by 

Jl  -  E{[y(t+1)  -  y'(t+l)]^Q^[y(t+l)  -  y’'(t+l)] 

+  [u(t)  -  u(t-l)]^Rj^[u(t)  -  u(t-l)]} 


where  y(t)  ■  Cc(t)  I(t)]^,  u(t)*Ij(5(t)  and  is  the  desired  output. 

We  assume  y’^(t)  grows  at  an  annual  rate  of  4%  from  y’^(O)  ■  [300  75]^. 

The  system  is  governed  by 

a(q"^)y(t)  -  B(q“Su(t-l)  +  Ce(t)  +  D 

where  the  numerical  values  of  a(z),  B(z),  C(z),  and  D  are  listed  in 
Appendix  C.  The  weighting  matrices  and  the  covariance  of  the  noise  is 
also  included  in  Appendix  C.  We  assume  B^  is  known  during  the  simulation. 

Simulation  results  indicate  that  the  system  can  indeed  be  stabilized 
along  the  targeted  growth  path.  The  input -output  time  responses  of  a 
typical  run  are  shown  in  Figures  3. 1-3.4.  Figure  3.1b  and  Figure  3.2b 
shows  the  output  responses  with  expanded  ordinate  after  the  algorithm  has 
settled.  We  notice  that  there  are  extreme  fluctuations  during  the  start 
up.  In  practical  applications,  these  may  not  be  permitted  and  can  be 
avoided  by  starting  with  initial  estimates  that  yield  a  satisfactory 
response.  See  [l8],  for  instance,  for  practical  considerations. 


Figure  3.2b.  Time  response  of  for  Nash  game  with  expanded  ordinate 
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CHAPTER  4 

STOCHASTIC  ADAPTIVE  STACKELBERG  GAMES 

4 . 1  Introduction 

In  this  chapter,  the  self- tuning  principle  will  be  called  upon  to 
solve  the  stochastic  adaptive  Stackelberg  game  problem.  The  Stackelberg 
game,  or  the  Leader-Follower  (L-F)  game,  was  first  Introduced  In  the  con¬ 
text  of  a  static  economic  problem  with  two  decision-makers  [52].  It  has 
been  extended  to  dynamic  cases  In  [49,  50,  5l].  In  the  L-F  game,  one  or 
more  group  of  decision-makers,  tdilch  will  be  called  the  follower.  For 
Information  than  the  other  group,  which  Is  called  the  follower.  For 
Instance,  the  leader  knows  the  cost  function  of  the  follower  but  the 
follower  may  not  know  the  leader's  cost  function.  Equipped  with  the  know¬ 
ledge  of  the  follower's  cost  function  (thus  the  possible  rational  decision 
of  the  follower) ,  the  leader  will  perform  his  optimization  taking  Into 
account  the  possible  reaction  of  the  follower.  In  the  L-F  game,  the 
leader  will  announce  his  strategy  first  or  act  first.  The  follower  then 
performs  his  optimization  subject  to  his  knowledge  of  the  leader's  action; 
that  Is,  he  Is  reacting  to  the  leader's  decision.  Even  though  the 
computation  for  the  leader  may  be  more  complicated  than  the  Nash  game 
case,  he  will  do  no  worse.  In  terms  of  cost,  and  In  general  will  do  better 
using  the  L-F  strategy  rather  than  the  Nash  strategy.  In  general,  however, 
nothing  can  be  stated  regarding  the  cost  for  the  follower  compared  to  his 
cost  In  the  Nash  game  case.  The  L-F  game  framework  Is  particularly 
appealing  to  optimization  of  hierarchical  or  multilevel  systems  where  the 
follower  or  lower  level  controllers  may  have  limited  access  to  certain 


information  or  they  may  have  limited  computing  capability.  In  an 
economic  system,  for  example,  the  government  may  be  the  leader  over  the 
business  community  because  of  its  vast  data  base.  Another  example  is  in 
distributed  control  system  in  which  the  local  process  control  computers 
may  have  limited  computing  capacity  compared  to  the  central  computer. 

There  are  a  host  of  variations  in  the  L-F  game.  For  instance,  the 
group  of  leader  and/or  follower  may  elect  to  use  the  Nash  strategy 
instead  of  conforming  to  one  single  objective  among  their  respective 
group.  There  may  also  be  N  groups  of  decision-makers  with  a  hierarchial 
structure  such  that  the  higher  level  controller  is  a  leader  to  the 
succeeding  controller  [l3,  24].  In  this  report,  we  will  concentrate  on 
2-Person  Stackelberg  games  with  denoting  the  leader  and  OH2  denoting 
the  follower. 

Definition  4.1.  Let  DM^,  the  leader,  choose  control  u^  €  and  DM2 

the  follower,  choose  control  u^  €  where  and  U2  are  the  sets  of 
admissible  controls.  The  cost  function  associated  with  DM^  is 
(i  ■  1,2).  Assume  there  exists  a  mapping  T:  -»  U2.  For  each  control 

u^  chosen  by  OM^,  DM2  chooses  U2  *  T(u^)  such  that 

J2(Uj^,T(Uj^))  ^  J2^“1’“2^  ’ 

The  leader,  DM^,  chooses  u*  such  that 

Jl(u*,T(u*))  <  Jj^(uj^,T(uj^))  ,  V  u^  6  Uj^  .  (4.2) 

The  strategy  pair  (u^,  U2  *  T(u^))  is  called  the  Stackelberg  equilibrium 
strategy  pair. 


I 
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I 


JV 


fv 
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Aa  In  the  Nash  game  problem,  there  are  different  solution  concepts 
to  the  L-F  game.  The  solutions  to  open-loop,  feedback,  and  closed-loop 
L-F  games  are  in  general  different  [21,  23,  44].  However,  if  we  restrict 
the  cost  functions  of  the  decision-makers  to  single-stage,  we  reduce  the 
problem  to  a  static  one  and  clrctimvent  the  problem  of  different  types  of 
solution. 

In  Section  4.2,  the  L-F  gaaie  problem  will  be  formulated.  The 
solution  to  the  stochastic  adaptive  Stackelberg  game  problem  is  presented 
in  Section  4.3.  The  same  basic  approach  used  in  the  analysis  for  the  Nash 
game  problem  is  found  to  be  quite  appropriate.  A  simulation  example  of 
an  economic  system  is  presented  in  Section  4.4. 

4.2  Problem  Formulation 

Consider  a  system  given  by  an  input-output  description 

a(q"by(t)  -  B(q“Su(t-k-l)  +  C(q‘^)e(t)  +  D  ,  k  >  0  (4.3) 

where  a(z) ,  B(z),  C(z)  and  D  are  as  defined  in  (3.7).  The  vector 
u(t)  ■  Cu^(t)u2(t)  ]^  with  Uj^  being  the  control  of  the  leader  and  U2  being 
the  control  of  the  follower.  The  output  y(t),  input  u(t),  and  noise 
sequence  {e(t)}  are  all  of  dimension  p.  [e(t) }  is  assumed  to  be  an  in¬ 
dependently  equally  distributed  zero  mean  random  vector  with  finite  co- 
variance.  The  cost  function  associated  with  the  i-th  decision-maker  is 
given  by 

-  E{[y(t+k+l)  -  y’^(t4k+l)]’^Qj^[y(t-Hc+l)  -  y’^(fHc+l)] 

+  [u(t)  -  u(t-l)ft^[u(t)  -  u(t-l)]}  ,  i  -  1,2  (4.4) 


imere  y  is  the  desired  output  of  the  system  and  and  are  symsietrlc 
positive  semldeflnlte  matrices.  The  reason  for  the  penalty  of  control 
change  between  each  time  step  is«  as  mentioned  before,  to  avoid  the 
problem  of  calculating  the  reference  control  signal  associated  with  non¬ 
zero  reference  output. 


4.3  Self-Tunlns  Leader -Follower  Game 


The  L-F  equilibrium  strategy  will  first  be  derived  for  the  system 
assuming  all  the  parameters  are  known.  Then  an  adaptive  scheme  similar 
to  the  one  used  In  the  Nash  game  problem  Is  presented  to  deal  with  unknown 
parameters.  For  ease  of  derivation,  we  will  limit  the  controls  u^  and  U2 
to  be  scalar  valued.  That  Is,  we  will  consider  a  two  Input  two  output 
system  In  this  chapter.  The  results,  however,  can  easily  be  extended  to 
vector  valued  controls. 

4.3.1  TWo-Person  Leader-Follower  Eaulllbrlum  Strategy 


The  derivation  of  the  L-F  equilibrium  strategy  from  the  necessary 
conditions  Is  basically  the  same  as  that  of  the  STC  except  for  certain 
modifications.  The  result  Is  sumoarlzed  In  the  following  theorem. 


Theorem  4.1.  Let  L^  '  represent  the  1-th  column  of  a  matrix  L  and  let 
(R)lj  denote  the  l,J-th  entry  of  the  matrix  R.  The  Leader-Follower 
equilibrium  strategy  u*(t)  ■  (u*(t)u2(t))^  for  the  system  (4.3)  with  cost 
functions  (4.4)  Is  given  by 

MG(q‘S^J(q"Sy(t)  +  CMF(q‘SC(q"SB(q'S 

+  (l-q‘hc(q"bH]u*(t)  +  MF(q"b(3(q’SD 


-  c(q“^)My’^(t-rtc+l)  -  0 


(4.5) 


C<z)  *  adjoint  C(z) 


(4.6a) 


c(z)  ■  det  C(z) 

G(z)  -  Gq  +  G^z  +  ...  + 

F(z)  -  1  +  Fj^z  +  ...  +  Fj^z*^ 


“2 

T 

—  - 

r  T  T  -T 

«1 
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T 

»2 
1-  J 
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^2 

where 

and  G(z),  F(z)  satisfy 

C(z)  -  a(z)F(z)  +  z‘'‘^^G(z)  . 


(4.6b) 

(4.6c) 

(4.6d) 


(4.6e) 


(4.6f) 


(4.6g) 


(4.7) 


Proof.  See  Appendix  D. 

It  can  readily  be  observed  that  the  L-F  strategy  resembles  the  Nash 
strategy  (3.8)  and  STC  strategy  (2.22)  In  every  aspect  except  for  the 
definition  of  the  matrices  M  and  H.  Another  remark  Is  that  If  there  has 
been  no  penalty  on  the  control  (H  *  0)  and  assuming  that  M  Is  non-singular, 
the  L-F  strategy  will  reduce  to  the  minimum  variance  strategy  for  the  STR 


and  this  resulting  L-F  strategy  is  Independent  of  the  weights 

(1  ■  1,2).  That  Is,  the  game  aspects  of  the  problem  will  not  arise  If 

there  Is  no  penalty  on  the  control  action. 

Notice  that  the  leader  will  have  to  solve  (4.5)  to  obtain  the  L-F 
strategy.  The  follower,  on  the  other  hand,  will  Just  need  to  go  through 

^  "if 

part  of  (4.5)  to  obtain  u^  since  the  leader  acts  first  and  thus  u^  Is 
known  to  the  follower.  Also,  notice  that  the  follower  only  needs  to 
know  and  H2  In  order  to  solve  U2>  The  fact  that  the  follower  may  not 
know  the  leader's  cost  function  Is  again  very  transparent  In  this  Instance. 

To  utilize  the  self-tuning  approach,  the  original  system  Is  trans¬ 
formed  to  an  equivalent  system  as  In  the  Nash  game  analysis.  Let  0(t)  be 
defined  by 

0(t)  -  M(y(t)  -  y^(t))  +  H(u(t-k-l)  -  u(t-k-2))  .  (4.8) 

The  transformed  system  In  0  becomes 

a(q"b0(t)  -  (MB(q’^)  +  (l-q’Sa(q‘SH)u(t-k-l) 

+  MC(q’^)e(t)  +  MD  -  a(q"^)My’^(t)  .  (4.9) 

If  we  define  a  cost  function  I  given  by 

I  -  E{0^(t+k+l)0(t-Ht+l)}  ,  (4.10) 

it  Is  possible,  as  In  the  Nash  game  case,  to  obtain  the  L-F  equilibrium 
strategy  by  considering  the  transformed  system  as  a  minimum  variance 
control  problem  with  one  decision-maker. 


4.3.2  Two-Person  Self-Tuning  Stackelberg  Strategy 

In  order  to  deal  with  unknown  parameters,  we  assume  all  the  decision¬ 
makers  agree  to  use  the  same  estimation  scheme  and  identical  initial 
conditions.  The  parameter  is  assumed  to  be  known  so  that  the  matrix 
M  can  be  computed. 

To  control  the  process  (4.3)  with  cost  functions  given  by  (4.4), 
each  decision-maker  will  use  the  following  model  as  representation  of  the 
system 

0(t)  -  i7(q"^)y(t-k-l)  +  iS(q‘Su(t-k-l) 

+  ai(q"hy'(t)  +  ^  +  e(t)  (4.11) 

where  ^(z),  i5(z),  9(z)  and  are  as  defined  in  (3.14)  and  e(t)  is  the 
error  to  be  minimized  in  the  least  squares  sense. 

The  certainty  equivalent  L-F  strategy  u  is  then  given  by 

0  -  ikq'bu*(t)  +  5^(q”Sy(t)  +  3(q‘by’'(t+k+l)  +  >  (4.12) 

A 

where  L  denotes  the  estimate  for  L.  The  parameters  can  be  estimated  using 
stochastic  approximation  scheme  given  by  (3. 16) -(3. 19). 

To  further  appreciate  the  structure  in  the  self-tuning  L-F  game,  we 
will  go  into  some  interesting  properties  of  this  adaptive  procedure. 

The  leader  in  the  game  will  have  to  estimate  all  the  controller  parameters 
in  order  to  compute  u  .  On  the  other  hand,  the  follower,  who  acts  after 
the  leader  has  acted,  has  a  sis^ler  estimation  computation.  Specifically, 
the  follower's  estimation  computation  is  part  of  the  leader's  computations. 
Let  us  elaborate  by  further  considering  the  case  where  u^  and  U2  are 


scalar-valued.  For  simplicity,  assume  D  ■  0  and  ■  0.  Let  Cl{z) ,  S(z) 
0(t)  be  given  by 


- 


0(t)  - 


aii(z) 


*21^*)  *22^*^ 


b,i(z) 


b2i(z) 


01  (t) 


02  (t) 


bi,(*) 


b,,(*) 


Nv(0. 


(4.13a) 


u(t-k-l)  . 


(4.13b) 


(4.14) 


From  (4.14)  02^*^)  given  by 


02(t)  -  M2y(t)  +  H2u(t-k-l) 


(4.15) 


and  M2,  H2  are  functions  of  the  follower's  cost  function  only.  The 
follower.  In  fact,  only  requires  02  for  his  controller.  Consider  the 
following  equation  which  Is  part  of  (4.11) 

02(t4fc+l)  -  €2(t+k+l)  -  a2j^(q’^)y^(t)  +  «22^‘*"^^^2^^^ 


+  b2i(q’^)Ui(t)  +  b22(q'^)»i2(^)  • 


(4.16) 


The  follower's  optimal  strategy  U2,  with  u^  available  after  the  leader's 
action,  Is  given  by 


b22(<l“b“^t)  -  -a2^(q’Sy^(t)  - 


-  b,,(q"^)ut(t)  . 


(4.17) 
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Hence,  the  follower  will  save  some  computational  effort  compared  to  the 
leader. 

4.3.3  Convergence 

Convergence  results  for  the  L>F  game  problem  is  exactly  the  same  as 
stated  In  Theorem  3.2  except  for  the  change  in  the  definition  of  the 
matrices  M  and  H. 


Theorem  4.2.  Let  the  matrices  M  and  H  be  defined  by  (4.6e)  and 
(4.6f)  respectively  and  assume  the  conditions  on  the  system  in  Theorem  3.2 
are  satisfied.  If  the  self-tuning  L-F  strategy  (4.12)  is  applied  to  the 
system  (4.3),  then  the  system  input -output  will  remain  bounded  and  the 
prediction  error  for  0(t)  will  tend  to  its  global  minimum  achievable  by 
any  causal  linear  feedback  with  probability  one. 


3  S 
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4.4  Simulation  Stud\ 


The  economic  model  presented  in  Section  3.4  is  again  used  to  study 
the  performance  of  the  self -tuning  strategy.  In  this  case,  we  assume  the 
federal  government  (Congress  and  the  President)  to  be  the  leader  and  the 
FRS  as  the  follower.  The  same  and  used  in  the  Nash  game  simulation 
are  used  in  this  case. 

Simulation  results  indicate  that  the  algorithm  can  Indeed  stabilize 
the  system  along  the  targeted  1%  quarterly  growth.  The  input-output  time 
responses  of  a  typical  run  are  shown  in  Figures  4. 1-4.4.  Figure  4.1b  and 
Figure  4.2b  shows  the  output  responses  with  expanded  ordinate  after  the 
algorithm  has  settled.  Again,  there  are  extreme  fluctuations  during  the 
start  up  period  as  the  controller  is  trying  to  learn  the  characteristics 


In 


Figure  4.1a.  Time  response  of  y.  for  Stackelberg  game 


CHAPTER  5 


DECENTRALIZED  STOCHASTIC  ADAPTIVE  NASH  GAMES 

5 . 1  Introduct Ion 

In  this  chapter,  an  explicit  self- tuning  method  Is  utilized  to 
develop  an  algorithm  for  systems  with  unknown  parameters  and  multiple 
controllers  each,  besides  having  a  different  objective,  has  a  different 
set  of  Information  about  the  system.  This  decentralized  system  framework 
Is  suitable  for  analyzing  large  scale  Interconnected  systems  In  which  the 
communication  and/or  computational  costs  Involved  may  prohibit  the  Imple¬ 
mentation  of  a  centralized  control  policy.  Decentralized  Information 
among  decision-makers  was  first  studied  In  the  framework  of  static  team 
theory  In  [48]  and  was  further  extended  in  [ll,  12,  34,  35,  36]. 

In  this  report,  we  will  confine  our  analysis  to  Two-Person  de¬ 
centralized  stochastic  adaptive  Nash  games  with  an  Information  structure 
termed  "one-step-delay  sharing  pattern"  [57].  We  will  restrict  the  cost 
function  of  each  decision-maker  to  a  single-stage,  thus,  turning  the 
dynamic  situation  Into  a  static  Nash  game  framework.  In  Section  5.2,  the 
formulation  of  the  decentralized  Nash  game  problem  Is  presented.  In 
Section  5.3,  we  approach  the  known  parameter  problem  by  a  stral^tforward 
constraint  on  the  form  of  the  control  policy  as  done  similarly  for  the 
single  controller  problem  In  [32,  37,  41].  In  Section  5.4,  another 
approach  Is  used  to  tackle  the  known  parameter  decentralized  game  problem. 
Specifically,  we  extend  results  of  static  Nash  games  In  [ll,  12]  to  our 
problem. 


To  deal  with  the  unknown  parameter  case,  a  recursive  estimator  is 
used  to  determine  the  system  parameters  explicitly.  We  force  the 
certainty  equivalence  condition  upon  the  system  and  substitute  the  true 
parameters  by  the  estimates  into  the  control  law.  The  proposed  algorithm 
can  be  classified  as  an  explicit  self*tuning  strategy  as  the  systems 
parameters  are  estimated  explicitly  and  then  manipulated  to  determine  the 
optimal  policy.  Even  though  convergence  for  this  procedure  is  not 
guaranteed,  our  simulation  studies  for  an  economic  system,  which  is 
presented  in  Section  5.5,  do  show  that  the  algorithm  is  capable  of 
stabilizing  the  system  along  a  desired  path  asymptotically.  Furthermore, 
our  simulation  results  indicate  that  the  two  different  decentralized 
approaches  will  generate  the  same  optimal  policy  hinting  that  the  two 
methods  may  actually  be  equivalent. 

5.2  Problem  Formulation 

Consider  a  system  with  multiple  decision-makers  each  has  u^ 

(i  "  1,2,...,N)  as  his  control.  The  system  is  governed  by 

y(t+l)  -  a(q"Sy(t)  +  B(q"^)u(t)  +  e(t+l)  +  D  (5.1) 

where  u(t)  is  formed  by  stacking  up  all  the  Uj^(t).  The  dimension  of  the 
i-th  component  of  y,  y^,  is  assumed  to  be  of  the  same  dimension  as  u^. 

The  sequences  [y(t)},  {u(t)},  [e(t) }  are  all  of  dimension  p.  The 
disturbance  sequence  [e(t)}  is  an  independent  identically  distributed  zero 
mean  while  noise  with  finite  convarlance  given  by  E{e(t)e^(t)}  *  W.  B(z) 
is  a  matrix  polynomial  and  a(z)  is  a  scalar  polynomial  as  given  by 


f 
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a(z)  ■  *0  +  +  ...  +  a^z“  (5.2) 

B(z)  -  Bq  +  Bj^z  +  ...  +  Bjj^z“  .  (5.3) 

The  D  In  (5.1)  is  an  p -dimensional  offset  vector  with  i-th  row  block  D^, 
which  is  also  the  same  dimension  as  u^.  A  steady  state  decentralized  Nash 
equilibrium  strategy  for  the  system  is  to  be  sought.  The  cost  function  of 
each  decision-maker  is  given  by 

jJ  -  E{Cy^(t+l)  -  y^(t+l)fQj^[y^(t+l)  -  yj(t+l)] 

+  [u^(t)  -  u^(t-l)]\[u^(t)  -  u^(t-l)]}  . 

i  -  1,2,...,N  (5.4) 

where  is  symmetric  positive  semidefinite,  R  is  synmetric  positive 
definite  and  yj  is  the  desired  value  of  the  i-th  output  y^. 

In  our  problem,  at  every  step  of  time  t,  the  i-th  decision-maker  is 
assumed  to  have:  i)  yj^(t)  and  past  outputs  y(t-l)  ,y(t-2)  , . . . ;  ii)  past 
inputs  u(t-l) ,u(t-2) , , . .  as  his  information.  This  class  of  information 
pattern  is  called  "one-step-delay  sharing  pattern"  [57].  The  i-th  decision¬ 
maker  attempts,  under  this  information  structure,  to  minimize  (5.4)  with 
respect  to  u^(t)  with  the  assumption  that  the  other  decision-makers  use 
the  Nash  equilibrium  strategy  as  well.  In  this  report,  the  number  of 
decision-makers  is  limited  to  two.  However,  the  algorithm  can  easily  be 
generated  for  more  than  two  controllers  once  the  methodology  of  the 
solution  is  understood. 

To  facilitate  our  analysis,  the  cost  functional  (5.4)  will  be  de¬ 


composed  to  a  form  in  which  only  the  part  that  directly  affects  the 


optimization  result  Is  kept.  It  can  be  shown  by  straightforward  sub¬ 
stitution  that  (5.4)  can  be  written  In  the  following  form: 
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J°  -  E{u^(t)D^j^u^(t)  +  2u^(t)D^jUj(t) 

+  2aQyJ(t)Qj^(BQ)j^^Uj^(t)  +  2Ca(q"^)y^(t) 

+  B^j(q‘buj(t)  +  Bj^j^(q"bu^(t) 

(5.5) 

+  -  y'(t+l)]\(BQ)j^j^Uj^(t) 

+  2Cu^(t-l)Rj^Uj^(t)]} 

+  terms  not  Involving  u. (t)  ,  l,j  ■  1,2 

^  H*  J 

with 

“li  ■  +  *1 

\l  -  <”o>U«l<VlJ  '5.7) 

and  (B0)^j  denotes  the  l,J-th  block  of  the  zero-th  order  element,  B^,  of 

«<w 

the  matrix  polynomial  B(z).  B^j(z)  denotes  the  l,j-th  block  of  B(z)  with 
(Bq)^j  taken  out,  that  Is, 

B^j(z)  -  B^j(z)  -  (Bo)j^^  .  (5.8) 

The  scalar  polynomial  a(z)  Is  similarly  defined  as 

a(z)  -  a(z)  -  Sq  .  (5.9) 

We  will  let  denote  the  "active"  part  of  In  (5.5),  that  Is,  the  part 
that  Involves  u^(t).  Hence,  we  have 


+  terms  not  Involving  u^(t)  . 


(5.10) 


5.3  Constrained  Decentralized  Nash  Game 


Consider  a  system  governed  by  (5.1)  In  which  each  controller  has  a 
cost  function  given  by 

Jl  -  E{uJ(t)Dj^j^u^(t)  +  2uJ(t)Dj^jUj(t) 

+  2aQyJ(t)Q^(BQ)j^j^u^(t)  +  2[5:(q'Syi(t) 


+  Su^(t)  +  B^j(q'^)Uj(t) 


+  -  yJ(t+l)]’^Qj^(BQ)^^Uj^(t) 


(5.11) 


+  2u^(t-l)R  u  (t)}  .  i,j  -  1,2 

1  11  ^  ^  j 


Let  the  matrix  C^  be  defined  by 


^1  "  ’ 

and  let  the  function  x^  be  defined  by 


(5.12) 


Xi(t)  -  (BQ)^^Q^[S(q*Syi(t)  +  B^^(q"Su^(t) 
+  B^j(q'Suj(t)  +  -  yi(t+l)] 


+  R,u.(t-1)  ,  1  -  1,2  . 


(5.13) 


Notice  that  at  time  t,  the  value  of  x^(t)  Is  known  as  It  does  not  depend 


on  any  future  data. 


Now  we  can  rewrite  as 


Jj  -  E{uJ(t)D^jU^(t)  +  2uJ(t)DjjU^(t)  +  2yJ(t)Cj^u^(t) 
+  2xJ(t)Ui(t)}  ,  l.J  -  1,2  . 


(5.14) 
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The  constrained  decentralized  Nash  equilibrium  strategy  for  the 
system  (5.1)  with  cost  functional  In  (5.14)  Is  first  presented  In 
Section  5.3.1  assuoilng  all  the  system  parameters  are  known.  Then,  the 
certainty  equivalence  Is  Invoked  heurlstlcally  and  a  stochastic 
approximation  type  estimation  scheme  Is  used  to  obtain  estimates  that 
are  substituted  Into  the  optimal  policy  In  place  of  the  true  parameters. 

5.3.1  Nash  Game  with  Constrained  Policy 

Let  the  control  u^  of  the  1-th  decision-maker  be  of  dimension  m^, 

1  *  1,2.  Thtis,  the  associated  output  measurement  y^  for  the  1-th 
controller  Is  also  m^-dlmenslonal .  The  1-th  decision-maker  tries  to 
minimize  with  respect  to  u^  which  Is  of  the  form 

Ui(t)  -  Gj^yj^(t)  +  gj  .  1-1,2  (5.15) 

fdiere  6^  Is  a  m^  X  m^  matrix  and  g^  Is  a  m^-dlmenslonal  vector.  The 
constrained  policy  Is  stated  In  the  following  theorem. 
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Theorem  5.1.  Let  the  characteristic  root  of  a  matrix  A  with  maximum 


absolute  value  be  denoted  by  X  (A),  then  the  condition 

ffl 


(5.16) 


Is  sufficient  for  the  system  (5.1)  with  cost  functions  (5.14)  to  admit  a 
unique  Nash  solution.  The  gains  G^,  g^  of  (5.15)  satisfy  the  following 


®i  ‘  °ii°ij°jj*'ji''rij”jj"jrii 


■°ii^i  °ii°ij°jj^j'^ji^ii  ’  J 


(5.17) 


*“• 


+  cj7.(t)  +  x.(t)]  ,  i,j  -  1,2 
^  ^  ^  1  ft  j 


(5.18) 


where  denotes  the  expectation  of  y^  and  W^.  Is  the  i,J-th  block  of  the 


noise  covariance  matrix  W. 


Proof .  See  Appendix  E. 


Notice  that  the  gain  6^  Is  Independent  on  a^,  Bq,  and  only. 
Hence,  in  the  case  when  a^,  Bq  parameters  are  known,  once  Is  determined 
It  does  not  require  further  computation. 

5.3.2  Self-Tunlns  Constrained  Decentralized  Nash  Game 


In  order  to  obtain  the  Nash  strategy  for  the  system  with  unknown 
parameters,  we  propose  an  ad  hoc  method  of  certainty  equivalence.  In  this 
procedure  we  will  assume  a'^  and  Bq  are  known  to  avoid  possibilities  of 
non-existence  of  solutions  for  (5.17).  Furthermore,  we  will  allow  a  unit 
delay  In  the  estimation  scheme,  that  Is,  at  time  t,  the  system  parameter 
estimates  used  for  the  control  computation  are  based  on  past  Input -output 
data  only.  In  addition,  we  assume  each  decision-maker  uses  the  saoie 
estimation  scheme  and  Initial  conditions  so  that  the  problem  of  multi¬ 
modelling  can  be  avoided. 

The  recursive  procedure  In  [40],  which  Is  a  stochastic  approximation 
type  algorithm,  can  be  used  to  estimate  the  system  parameters  explicitly. 
Introduce  the  parameter  matrix  0  defined  by 


where  each  Ct^  (1  ■  0,l,2,...,n)  Is  a  diagonal  matrix.  The  following 


recursions  are  carried  out  at  each  step  of  time  to  estimate  6^, 

J  •  . . . »p: 


ej(t)  -  0j(t-i)  -  TiJ(t-i)e(t-i)]  (5.20) 

rj(t)  -  rj(t-l)  +  YCOCTljCt-DTljCt-l)  -  rj(t-l)]  ,  (5.21) 

rj(0)  -  1 

Tlj(t-l)  -  [yJ(t-l)...yJ(t-n)uV-l)...u^(t-n)l]’^  (5.22) 


with  Y(t)  being  a  decreasing  sequence  In  t.  Notice  that  the  assumptions 
Sq,  Bq  are  known  and  will  lead  to  the  setting  of  ^  "  a^I  and  *  B^. 

A  block  diagram  of  the  closed-loop  system  Is  shown  In  Figure  5.1. 

Convergence  of  the  estimator  will  certainly  lead  to  the  convergence 
of  the  Nash  strategy.  The  condition  for  convergence  for  the  filter 
equations  (5.20)  and  (5.21)  has  been  Investigated  [40].  It  Is  shown  that 
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If  u(t)  Is  a  white  noise  process,  then  the  estimates  will  yield  a  correct 
description  of  the  Input-output  data.  Conceivably,  In  a  multivariable 
system,  there  may  be  different  sets  of  estimates  that  yield  the  same 
description  of  the  system.  Hence,  suitable  Identlflablllty  condition  of 
the  system  Is  required  to  ensure  proper  convergence  of  the  adaptive 
scheme.  Identlflablllty  conditions  for  multivariable  systeois  has  been 
Investigated  In  [30,  55]. 

5.4  Extended  Static  Decentralized  Nash  Game 

In  this  section,  we  will  solve  the  known  parameter  case  by  applying 
results  In  [11,  12]  to  our  present  problem.  Then  the  estimation  scheme 
(5. 18) -(5.22)  Is  used  to  obtain  explicitly  the  system  parameters  which 
are  then  substituted  Into  the  optimal  policy  derived  from  the  known  para¬ 
meter  case. 

Before  utilizing  the  results  In  static  Nash  games,  reformulation  of 
the  problem  Into  the  appropriate  setting  Is  required.  The  cost  function 
In  (5.11)  Is  rewritten  In  the  form 

Jf  ■  E[uJ(t)D^j^Uj^(t)  +  2u^(t)Dj^jUj(t)  +  2x^(t)C^Uj^(t)  ], 

(5.23) 

l.J  -  1.2 
i  1*  J 

with  D^j  as  defined  In  (5.6)  and  (5.7)  and 


where  ■  dimension  and 


x(t) 


(5.25) 


where 


Xi(t)  -  (Bo)JiQi[Vi(‘=>  a(q‘Sy^(t)  +  Bj^(q’Su^(t) 

+  B^j(q"buj(t)  +  -  yj(t+l)]  +  Rj^Uj^(t-l)  , 

(5.26) 

l.J  -  1,2  . 

1  ^  i 

Notice  that  in  Xj^(t),  the  term  (BQ)j^iQ£aQyj^(t)  is  the  current  measurement 
that  is  available  only  to  the  1-th  decision-maker  at  time  t.  The  other 
terms  in  x(t)  are  dependent  on  past  input-output  data  that  are  available 
to  every  decision-maker  under  the  "one- step-delay  sharing  pattern".  Henc 
*j^(t)  can  be  considered  as  information  that  is  privileged  to  the  i-th 
controller  only. 


5.4.1  Eactended  Static  Decentralized  Nash  Solution 
Consider  x(t)  as  the  state  vector  in  a  state-space  representation  of 
a  system  In  which  the  1-th  controller  has  as  his  measurement.  The 

measurement  z^(t)  Is  given  by 

z^(t)  -  H^x(t)  +  v^(t)  ,  1-1,2  (5.27) 

where 


"l 

0  ]  } 

C  I 

(5.28a) 

C  0 

I  3  }  ni2 

(5.28b) 

“i 

«2 

and  Vj^(t)  Is  zero  mean  white  noise  with  positive  semldeflnlte  covariances 
1  "  1«2.  To  utilize  results  In  Ll2],  the  mean  value  of  x(t),  x(t) 
and  the  covariance  of  x(t) ,  cov{x(t) }  are  also  required.  We  Illustrate 
here  how  x(t)  and  cov(x(t) }  can  be  computed.  At  time  t,  past  Input- 
output  data  are  known,  thus 

Xj^(t)  -  E{x^(t)} 

+  Bj^j(q"buj(t)  +  -  yj(t+l)]  +  Rj^Uj^(t-l)  , 

(5.29) 

i.j  -  1,2 

i  ^  i 

where  y^^Ct)  -  E{yj^(t)  }  and  y^^  Is  given  by 


y^Ct)  -  a(q  )y^(t-l)  +  )Uj^(t-l) 


+  B  (q  ^)u  (t-1)  +  D,  ,  i,j  -  1.2 


(5.30) 


Let  the  cov{x(t) }  ■  Q,  then 


Q  -  El[x(t)  -  x(t)][x(t)  -  x(t)f } 


Q  -  E 


<VuVo<yi<‘=>  - 


(80)22^2*0^^2^^^  -  y2(t)  (®0^L^2*0^y2^*^^  ■ 


Q  -  a 
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(5.31) 


where  W^j  denotes  the  l,J-th  block  of  the  noise  covariance  matrix  W  and 
the  fact  that 


E[(y^(t)  -  y^(t))(yj(t)  -  7j(t)^} 


(5.32) 


has  been  used. 

We  are  now  ready  to  apply  the  result  In  [l2.  Theorem  2]  to  our  game 
problem. 


Theorem  5.2.  The  condition  (5.16)  In  Theorem  5.1, 


I’‘.<‘>u“l2'>2^21>  I  *  1 


Is  sufficient  for  the  system  (5.1)  with  cost  functions  (5.23)  to  admit  a 
unique  Nash  solution.  The  control  law  of  each  decision-maker  Is  given  by 


with 


u^(t)  -  G^x(t)  +  F^(x^(t)  -  J(t))  ,  1-1,2 


■  °ll®lj°jj°jl^  °ll'-^l  ■  °lj®jj^j^  ’ 


IJ  -  1.2 
i  i‘  j 


Xj^(t) 


ECx(t)  Iz^(t) } 


x(t)  +  QhJ(H^QH^  +  Tp‘\z^(t)  -  Hj^x(t))  , 


1  -  1,2 


and  Is  the  unique  solution  to 


where 


Fj^  +  PFj^L  -  M 


^  "  "^11^12^22^21 


L  -  QH^(H^qH[  +  T^)‘^H^qH2(H2QH2  +  T2)'^H2 

M  -  -D-Jc^  +  d'[d^2D;2^2^^«2^2  +  ^2)"^ 


and 


^  -  D-Jcj  . 


(5.33) 

(5.34) 


(5.35) 

(5.36) 

(5.37) 

(5.38) 

(5.39) 

(5.40) 


Proof.  See  proof  of  Theorem  2  In  [12], 


-  u,NV-  w'*  O  O  s."  -V* 
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There  Is  a  close  resemblance  of  Theorems  5.1  and  5.2.  In  both  cases, 
the  control  u^(t)  is  affine  in  yj^(t).  Moreover,  as  in  Theorem  5.1,  the 
gains  for  the  Information,  F^,  is  dependent  upon  the  system  parameters 
Sq  and  Bq  but  not  the  rest  of  a(z)  or  B(z). 

Notice  that  In  the  formulation  of  the  problem  In  (5.27),  white  noise 

sequence  {vj^(t)}  have  been  Introduced  into  the  system.  We  may  consider 

this  disturbance  as  measurement  error  of  y^^Ct)  and/or  noise  in  transmitting 

past  input-output  data.  On  the  other  hand,  if  no  such  noise  is  allowed 

into  the  system  (T^^  “0),  it  is,  intuitively,  reasonable  to  expect 

Theorems  5.1  and  5.2  to  generate  the  same  optimal  policy.  In  fact,  the 

presence  of  a  positive  definite  T^  during  the  formulation  stage  is  to 

T 

ensure  that  the  matrices  +  T^^) ,  i  “  1,2  are  non-singular.  If  we  assume 

Q  is  positive  definite,  by  nature  of  the  definition  of  H^,  the  problem  of 
singularity  can  be  avoided. 

5.4.2  Self -Tuning  Extended  Static  Decentralized  Nash  Games 

In  order  to  avoid  the  possibility  of  non-existence  of  solutions,  we 
assume  the  system  parameters  a^  and  Bq  are  known  while  the  rest  of  a(z), 

B(z)  and  D  are  unknown.  As  in  the  previous  approach,  the  system  para¬ 
meters  are  estimated  recursively  using  equations  (5. 19) -(5. 22)  assuming 
identical  algorithm  and  initial  conditions  for  all  decision-makers.  The 
estimates  are  then  substituted  into  the  equations  of  Theorem  5.2  in  place 
of  the  true  parameters  to  obtain  the  optimal  strategy.  Hence,  convergence 
of  this  Nash  policy  depends  on  the  convergence  of  the  estimates,  as 
commented  previously  in  Section  5.3.2. 

A  block  diagram  of  the  closed-loop  system  is  shown  in  Figure  5.2. 


It  is  obvious  that  the  two  different  approaches  are  almost  identical 


except  for  the  noise  sequences  {v^(t)}  (1  *  1,2)  that  are  introduced  in 


the  second  method. 


5.5  Simulation  Studies  and  Conclusion 


The  economic  system  presented  in  Section  3.4  is  used  to  demonstrate 
the  performance  of  the  algorithm.  We  assume  the  government  who  controls 
u^(t)  has  the  consumption  expenditure  y^^Ct)  as  its  measurement  and  the 
Federal  Reserve  System  who  controls  u^Ct)  has  the  private  investment 
y2(t)  as  its  measurement.  Although  this  phenomenon  may  not  be  entirely 
realistic,  we  can  interpret  this  case  as  a  situation  in  which  the  goveim- 
ment  places  a  strong  emphasis  on  the  current  consumption  while  the  FRS 
focuses  its  entire  energy  on  ensuring  the  targeted  path  of  current  invest¬ 
ment  is  followed.  The  cost  functions  weighting  matrices  are  given  by 


Qi  -  5 


Rj^  ■  0.02 


Q2  -  10 


R2  -  0.08  . 


The  same  noise  covariance  used  in  previous  simulations  is  used.  The  same 
nominal  y^  in  previous  cases  is  used. 

Simulation  results  indicate  that  both  adaptive  procedures  can  indeed 
stabilize  the  system  along  the  targeted  growth  path.  The  input-output 
time  responses  of  a  typical  run  using  the  extended  static  Nash  game 

approach  (with  T^  *  0)  are  shown  in  Figures  5.3a-5.6a  and  the  corresponding 
trajectories  using  the  constrained  policy  approach  are  shown  alongside  in 
Figures  5.3b-5.6b.  The  two  sets  of  input-output  responses  indicate  the 
two  methods  generate  exactly  the  same  optimal  policy.  Hence,  it  is  reason¬ 
able  to  assume  the  two  methods  are  equivalent .  There  may  be  situations  in 
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which  the  constrained  approach  may  offer  simpler  computations,  and  thus 
hold  an  edge  over  the  theoretical  by  solid  but  cumbersome  extended  static 
game  approach.  We  can  use  this  constrained  approach  with  peace  of  mind 
If  we  know  there  does  exist  a  theoretical  basis  for  the  policy  structure. 

The  Input-output  responses  of  the  same  run  with  all  the  parameters 
known  are  shown  In  Figures  5.7  and  5.8.  The  algorithms  perform  satis¬ 
factorily  when  all  the  parameters  are  known,  which  Is  particularly  evident 
during  the  start  up  period.  To  compare  the  error  In  the  optimal  policy, 

tp  II 

we  let  u  (t)  denote  the  controls  obtained  with  known  parameters  and  u  (t) 
denote  the  policy  obtained  with  unknown  parameters.  The  quantity 
e“(t)  ■  u*^(t)  -  u"(t)  ■  Ce“(t)e2(t)]^  Is  shown  In  Figure  5.9.  We  see 
that  the  policy  error  e'^(t)  seems  to  be  a  zero  mean  quantity,  which 
Indicates  the  algorithms  are  providing  good  controls  even  though  the 
parameter  estimates  In  the  simulation  are  far  from  converging. 


CHAPTER  6 


CONCLUSION 

In  this  report,  steady-state  solutions  are  obtained  for  the 
optimization  of  stochastic  systems  with  unknown  parameters  and  multiple 
decision-makers  each  having  his  own  objective.  The  solutions  obtained 
for  these  systems,  or  games,  have  the  advantage  of  simplicity  and  easy 
Implementation  and  thus  lend  themselves  to  possible  applications  In  a 
variety  of  actual  systems. 

Two  types  of  centralized  stochastic  adaptive  games  are  considered: 
the  Nash  game  problem  and  the  Leader-F'^llower  game  problem.  The  resulting 
adaptive  solutions  for  these  games  can  be  classified  as  those  of  the 
Implicit  self-tuning  type.  It  Is  established  In  this  report  that  by  a 
judicious  transformation,  these  game  solutions  can  be  made  to  resemble 
closely  the  Implicit  self-tuning  solution  for  the  single-controller 
single-objective  case,  thus  endowing  them  with  the  desirable  property  of 
simple  Implementation.  In  addition,  convergence  of  these  game  problems 
Is  established  utilizing  this  close  resemblance. 

In  Chapter  5,  we  proposed  two  explicit  self-tuning  type  methods  for 
decentralized  stochastic  adaptive  Nash  games  under  the  "one-step-delay 
Information  sharing  pattern".  The  first  method  Is  an  ad  hoc  constraint 
on  the  policy  form  while  the  second  one  Is  an  extension  of  static  Nash 
game  theory.  Simulation  results  show  that  both  methods  generate  Identical 
optimal  policy  and  Indicate  that  the  two  algorithms  may  be  equivalent. 

Even  though  results  from  simulation  are  satisfactory,  a  theoretical  basis 
for  convergence  of  the  decentralized  Nash  game  problem  still  needs  to  be 


established. 
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APPENDIX  A 

TRANSFORMAXION  OF  SYSTEHS 
Given  a  system  governed  by 

A(q“Sy(t)  -  B(q"Su<t-k-l)  +  C(q'Se(t)  +  D  (A,l) 

where  D  is  a  constant  offset  vector  and  A(z),  B(z),  C(z)  are  matrix 
polynomials.  The  vectors  y(t),  u(t),  e(t)  and  D  are  all  of  the  same 
dimension.  Let  a(z)  be  the  scalar  polynomial  formed  by  taking  the 
detenslnant  of  A(z)  and  let  A(z)  represent  the  adjoint  of  A(z) .  Hence, 
the  Inverse  of  A(z),  A*^  Is  given  by 

a’\z)  -  A(z)  .  (A. 2) 

Presultlplylng  (A.l)  by  a(z)A  ^(z)  yields 

«(q‘Sy(t)  -  A(q"bB(q‘Su(t-k-l)  +  A(q“Sc(q‘Se(t) 

+  A(q'SD 

or 

«(q“by(t)  -  B(q“S«(t-k-l)  +  C(q“Se(t)  +  D  (A. 3) 

where  B(z)  *  A(z)B(z),  C(z)  •  A(z)C(z)  and  D  >  A(1)D.  The  resulting 
system  (A.  3)  has  a  scalar  polynomial  operating  on  y(t). 


APPENDIX  B 


I  f 

a 


C-J 
"  15 


C-1 


PROOF  OF  THEOREM  3.1 


Consider  the  system  governed  by 


a(q’by(t)  -  B(q"hu(t-k-l)  +  C(q"be(t)  +  D  .  (B.l) 


Die  cost  function  associated  with  the  1-th  controller  Is  given  by 

-  E{Cy(t+k+l)  -  y'(t+fcfl)]\[y(t-Hc+l)  -  y*^(t4k+l)] 

+  Cu(t)  -  u(t-l)]\Cu(t)  -  u(t-l)]}  ,  (B.2) 

1  -  1,2 

From  the  proof  In  Theorem  2.1,  we  can  transform  (B.l)  Into  the  following 
prediction  oiodel  form 

C(q“hy*(t4k+1  It)  -  G(q’hC(q“by(t)  +  F(q‘b<J(q’bB(q’bu(t) 


where 


+  F(q"h(5(q"SD 


y*(t+k+l  |t)  •  y(t*Hc+l)  -  F(q"'^)e(t4fc+1)  . 


(B.3) 


(B.4) 


* 

Assuming  the  existence  of  ^  ,  we  substitute  y  Into  (B.2)  and  set 

ou. ^t; 

aji 

'">■  V  to  zero  to  obtain 
aUi(t) 

T 

0  -  BQ^^^Q^(y*(t+l«-l  It)  -  y'(t+fcfl)) 


+  R^'*'^(u(t)  -  u(t-l))  ,  1  -  1,2,. ..,N  .  (B.5) 


Stacking  up  tha  N  equatlona  in  (B.5),  va  hava 


0  -  M(y*(t-rtc+l jt)  -  y'(t+fc+l))  +  H(u(t)  -  u(t-l))  (B.6) 

\«here  M  and  H  are  daflnad  in  (3.9a)  and  (3.9£)  respectively.  Multiply 
(B.6)  by  c(z)  and  combining  the  resulting  equation  with  (B.3),  we  have 

MG(q’^)fl(q’^)y(t)  +  CMF(q“^)fl(q“^)B(q‘^)  +  (l-q"Sc(q‘SH]u(t) 

+  MP(q'^)C<q“^)D  -  c(q'^)My'(t+kfl)  -  0 


as  stated  In  the  theorem. 
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APPENDIX  C 
ECONOaC  MODEL 

The  economic  model  used  for  the  simulation  study  is  taken  from  Cl5, 
pp.  272].  It  is  given  in  the  following  form 

-  0.9266C^_j^  -  0.0203l^_j^  +  0.3190G^ 

+  0.4206M^  -  63.2386  (C.l) 

1^  -  0.1527C^_j^  +  0.3806l^_j^  -  0.0735G^ 

+  1.538M^  -  210.8994  (C.2) 

where  denotes  consumption  expenditures,  I^  is  private  investment 
expenditure,  6^  is  government  expenditure  and  is  the  money  supply. 

y^Ct)  •  C^,  yjCt)  -  I^.,  u^(t-l)  -  G^,  and  u^Ct-l)  -  We 
assusM  the  current  G^  and  are  the  result  of,  and  equal  to,  the  desired 
levels  that  were  specified  in  the  previous  time  step,  thus  the  tisie  lag 
in  the  definition  of  u^  and  U2  C46].  The  resulting  model  in  terms  of  y 
and  u  is  given  in  the  following  matrix  polynomial  form 


Ufeing  the  transformation  tachniqua  given  in  i^ppandix  A,  wa  have  the 
transformed  system 


ftViVi 


a(q"^)y(t)  -  B(q‘Su(t-l)  +  D  .  (C.5) 

For  siiiiulacion,  we  assume  the  system  is  perturbed  by  zero  mean  white  noise 
e(t),  that  is, 

a(q*by(t)  -  +  D  +  e(t)  (C.6) 

with 

54 

E{e(t)e^(t)}  - 

12 


The  wei^ting  matrices  for  sisiulation  in  Chapters  3  and  4  are 


All  input-output  variables  are  in  billions  of  dollars  and  each  time 
step  t  is  one  quarter. 
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APFEMDIX  D 
PROOF  OF  THEOREM  4.1 

Consider  the  system 

a(q’Sy(t)  -  B(q*btt(t-k-l)  +  C(q‘Se(t)  +  D  .  (D.l) 

The  cost  function  associated  with  the  1^^  controller  Is 

-  E{Cy(t-Hcfl)  -  y’^(t4fcH)]^Qj^[y(t+fcfl)  -  y'(t-rtcfl)] 

+  Cu(t)  -  u(t-l)]\[u(t)  -  u(t-l)]}  ,  (D.2) 

1  -  1,2 

From  proof  In  Theorem  2.1,  It  Is  possible  to  arrive  at  the  following 
prediction  model  form  of  (0.1), 

C(q"by*(t4fc+1  [t)  -  G(q“^)C(q"Sy(t)  +  F(q“^)0(q'^)B(q"hu(t) 

+  F(q“S0(q‘SD  (0.3) 

irtiere 

y*(t4k+l|t)  -  y(t*rtc4-l)  -  F(q‘S*(«*He+*l)  •  (D.4) 

The  y  is  substituted  Into  the  cost  function  (0.2)  and  the  necessary 
condition  for  minimum  Is  then  derived  as  follows. 

Follower.  J2  Is  the  cost  function  associated  with  the  follower. 
The  necessary  condition  for  tiie  follower  Is 


il 


m 


I 


Bq^  ^QjCy  (t-ric+l|t)  -  y’'(t-Hc+l)] 


+  R2^'‘^[u(t)  -  u(t-l)] 


(D.5) 


M2Cy*(t+k+l  jt)  -  y’^(t+k+l)]  +  H^CuCt)  -  u(t-l)]  .  (D.6) 


Leader.  is  the  cost  function  associated  with  the  leader.  The 
leader  optloiizes  taking  into  consideration  the  possible  reaction  of  the 
follower.  Thus,  he  will  append  the  follower's  optimization  equation  to 
that  of  his,  that  is,  he  will  minimize,  with  respect  to  u^  and  U2,  a 
cost  function  J  given  by 


T  ^2 

*  “^l  ^t  Su2(t) 


(D.7) 


where  is  a  Lagrange  multiplier.  The  necessary  conditions  of  minimum 
for  the  leader  is  ^  .  .  ■  0,  for  i  ■  1,2.  For  simplicity  of  derivation, 

ou^Vt; 

we  will  assuoie  u^  and  U2  to  be  scalar  valued.  Hence,  we  have 


T  T 

-  BQ^^^Q^Cy*(t-Hrt-l|t)  -  y’'(t+k+l)3  +  (R2>12> 


+  R/^^ru(t)  -  u(t-l)] 


(D.8) 


-  Bo^'‘^Q2Cy  (t-Hc+llt^  -  y’'(t+k+l)]  +  +  (R2)22) 

T 

+  -  u(t-l)]  .  (D.9) 

Equations  (D.6),  (D.8),  and  (D.9)  yield 

M[y*(t-Hc+1  jt)  -  y'(t-Hc+l)]  +  H[u(t)  -  u(t-l)]  -  0  (D.IO) 

where  M  and  H  are  as  defined  In  (4.6e)  and  (4.6£)  respectively. 

Coodjlnlng  (D.  3)  and  (D.  10) ,  we  have 

MG(q‘SC(q"^)y(t)  +  [MF(q“Sc<q‘SB(q"^)  +  (l-q"^)c(q‘SH]u(t) 

+  MF(q’^)(J(q"bD  -  c(q“^)My*‘(t+k+l)  -  0 


as  given  In  (4.5)  of  Theoreoi  4.1. 
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PROOF  OF  THEOREM  5.1 


Consider  the  system  governed  by  (5.1) 


y(t+l)  ■  a(q'^)y(t)  +  B(q"^)u(t)  +  e(t+l)  +  D 


(E.l) 


with  the  cost  functions  (5.14) 


3^  -  E{uJ(t)D^^u^(t)  +  2u^(t)Dj^jUj(t)  +  2yJ(t)C^Uj^(t) 


+  2x^(t)u^(t)}  ,  1  -  1.2  . 


(E.2) 


The  policy  Is  constrained  to  be  of  the  form 


Ui(t)  -  Gj^yj^(t)  +  gj^  . 


(E.3) 


Without  loss  of  generality,  let  us  consider  J^.  Substituting  u^  In  (E.3) 


Into  yields 


Jj^  -  E{yJ(t)GjD^^G^y^(t)  +  g{l>ii8i  +  2yJ (t)GjDj^^g^ 


+  2[y^(t)GjD^2®2y2^^^  ■*'  5^1^^^®1®12*2 
+  8Jd^2V2^^^  81^1282^  2[yi(t)C^Gj^y^(t) 


+  yJ(t)Cj^gj^  +  x^(t)Gj^yj^(t)  +  x^(t)gj^]} 


(E.4) 


Denote 


E{y^(t)yJ(t)}  -  P^j(t) 


(E.5) 


ECy«(t)}  -  y,(t)  . 


(E.6) 
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Then,  taking  expectation  of  the  terms  In  (E.4) ,  we  have 


trace[G^Dj^j^Gj^Pj^j^(t)  + 


+  2G^D^2V2i(‘^>  •"  2^(‘=>®X282  + 


+  2gjD^282  +  2C^Gj^P^j^(t)  +  2^(t)C^g^ 


+  2xJ(t)Gj^y^(t)  +  2x^(t)gj^]  . 


(E.7) 


The  following  formulae  are  then  used  to  evaluate  the  necessary  conditions 


for  minimum  «  0  and  -r^  ■  0: 

dGj^  ogj^ 


tr[NZ]  - 


|j  trCNZ^]  -  N 


tr[NZL]  -  nV 


1^  trCz’^LZN]  -  +  LZN  . 


Hence,  we  have 

9J, 


0--i 

3Gi 


"  °uVll^‘^  DnGiPii(t)  +  2Diigiy;(t)  +  2D,,G,P,,(t) 


'irr  11' 


'im^i’ 


'12'*2‘21' 


+  2D^2g2yJ(0  +  2cjp^^(t)  +  2x^(t)yJ(t) 


or 


+  cJPii(t)  +  x^(t)yj(t) 


(E.8) 


-  +  2D^^Gj^y^(t)  +  203^2^2^*"^  ^°12*2 


+  2cJyj^(t)  +  2x^(t) 


°  “  °11«1  +  h2W^'>  ■"  ^2^2 


+  cjyj^(t)  +  Xj^(t)  .  I 

ajj  aj, 

similarly,  from  tt-  ■  0  and  -r—  •  0  we  obtain 

0  ■  022^2^22^^^  ^  ®22®2^2^^^  ^21^1^12^^^  ®21®1^2^^^ 

+  C2P22(t)  +  3C2(t)yJ(t)  ( 


(E.9) 


(E.IO) 


0  -  02282  +  ^22^2^2^^'>  ®21®iyi^*'^  ®21*1 


m 


+  C2y2(t)  +  X2(t)  . 


(E.ll) 


«  T 

since  y(t)  and  E{y(t)y  (t)  }  are  required  to  solve  for  Gj^,  (1  •  1,2) 
In  (E.8)-(E.ll) ,  we  show  how  these  terms  are  computed. 

At  time  t,  past  u,y  are  known,  then 


Eiyj^(t)  J  -  E{a(q  )yj^(t-l)  +  )Uj^(t-l) 


+  Bj^^(q‘'-)Uj(t-l)  +  e(t)  +  D} 
■  a(q"^)yi(t-i)  +  Bj^^(q“^)u^(t-i) 


+  B^j(q  ^)Uj(t-l)  +  D  . 


(E.12) 


Pll(t) 


P22^(t)  F22(^) 


E{y(t)y^(t)3  -  variance{y(t)}  . 


Since  variance  Cy(t) }  •  covarianceCy(t) }  +  y(t)y^(t),  thus 


P(t)  -  E{(y(t)  -  y(t))(y(t)  -  y(t))^}  +  y(t)y^(t) 


-  E{e(t)e^(t)}  +  y(t)y'(t) 


W  +  y(t)y'(t)  . 


(E.13) 


Postfflultlplylng  (E.9)  by  the  term  y^(t)  and  subtracting  the  resulting 
equation  from  (E.8)  yields 


DiiGi[P^i(t)  -  7i(t)yJ(t)]  + 


•  -cJ(Pu(0  -  yi(t)^(t)) 


®12®2''21  ■  ■  • 


(E.14) 


Similarly,  If  y^Ct)  Is  postmultlplled  to  (E.ll)  and  then  subtracted  from 
(1. 10),  the  following  Is  obtained 


» 


®21®l”l2  ■*■  ^22^2*22  “  “^2*^22  * 


After  some  menlpuletlona  with  (E.  14)  and  (E. IS),  we  have 
G,  -  -  Sj^  ,1-1,2 


(E.15) 


(E.16) 


with 


(E.17) 

.  l.J  -  1.2 

(E.18) 

(E.19) 

1  In  (5.17)  of  Theorem  5.1.  After  the  G^^'s 

are  determined. 

and  ^2  solved  liiwtst  (E.9)  end  (£.11)  respectively 

*1  ■'■  ®ll®lj*j  " 

+  Cl5,(t)  +  x.(t)]  ,  1,J  -  1,2  (E.20) 

as  stated  In  (5.18)  of  Theorem  5.1.  Sufficient  conditions  for  existence 
of  solution  to  (E.16)  Is  discussed  In  the  proof  of  Theorem  3  In  [ll]  and 
Corollary  1.1  In  [12]  and  Is  stated  In  (5.16). 
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